Accessibility settings

Published on in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/65146, first published .
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study

Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study

Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study

Journals

  1. Kim S, Wihl J, Schramm S, Berberich C, Rosenkranz E, Schmitzer L, Serguen K, Klenk C, Lenhart N, Zimmer C, Wiestler B, Hedderich D. Human-AI collaboration in large language model-assisted brain MRI differential diagnosis: a usability study. European Radiology 2025;35(9):5252 View
  2. Azizoglu M, Escolino M, Kamci T, Klyuev S, Perez Bertolez S, Risteski T, Elhalaby I, Borkar N, Esposito C, Okur M, Lacher M, Mutanen A, Shehata S, Chiarenza F, Davenport M. Generative Artificial Intelligence Accuracy in Interpreting Forest Plots in Pediatric Surgery Meta-analyses: A Perspective From Pediatric Surgery Meta-analysis Study Group (PESMA). Journal of Pediatric Surgery 2025;60(7):162359 View
  3. Zhou S, Xu Z, Zhang M, Xu C, Guo Y, Zhan Z, Fang Y, Ding S, Wang J, Xu K, Xia L, Yeung J, Zha D, Cai D, Melton G, Lin M, Zhang R. Large language models for disease diagnosis: a scoping review. npj Artificial Intelligence 2025;1(1) View
  4. Boltaboyeva A, Baigarayeva Z, Imanbek B, Ozhikenov K, Getahun A, Aidarova T, Karymsakova N. A Review of Innovative Medical Rehabilitation Systems with Scalable AI-Assisted Platforms for Sensor-Based Recovery Monitoring. Applied Sciences 2025;15(12):6840 View
  5. Mavrych V, Yousef E, Yaqinuddin A, Bolgova O. Large language models in medical education: a comparative cross-platform evaluation in answering histological questions. Medical Education Online 2025;30(1) View
  6. Othman A, Sharqawi A, MohammedAziz A, Ali W, Alatiyyah A, Mirah M. Assessing the Accuracy and Completeness of AI-Generated Dental Responses: An Evaluation of the Chat-GPT Model. Healthcare 2025;13(17):2144 View
  7. Jaleel A, Aziz U, Farid G, Zahid Bashir M, Mirza T, Khizar Abbas S, Aslam S, Sikander R. Evaluating the Potential and Accuracy of ChatGPT-3.5 and 4.0 in Medical Licensing and In-Training Examinations: Systematic Review and Meta-Analysis. JMIR Medical Education 2025;11:e68070 View
  8. Kasagga A, Sapkota A, Changaramkumarath G, Abucha J, Wollel M, Somannagari N, Husami M, Hailu K, Kasagga E. Performance of ChatGPT and Large Language Models on Medical Licensing Exams Worldwide: A Systematic Review and Network Meta-Analysis With Meta-Regression. Cureus 2025 View
  9. Okayo O, Panwal N, Ihuarulam O, Nzunde M, Nyimwadang F, Oladosu T, Osunde. A. Transforming Medical Laboratory Science with Vision-Language Models: A Focus on Microscopy in Microbiology, Hematology, and Histopathology. Oncology, Nuclear Medicine and Transplantology 2025;1(2):onmt008 View
  10. Nguyen V, Vuong T, Nguyen V, Ma, H. Benchmarking large-language-model vision capabilities in oral and maxillofacial anatomy: A cross-sectional study. PLOS One 2025;20(10):e0335775 View
  11. Dundas N, Law T, Brender T, Mills H, Espejo E, A. Heintz T, Wallace A, Cobert J. All That Shines Is Not Gold: Maintaining Scientific Rigor When Evaluating, Interpreting, and Reviewing Studies Using Large Language Models. Anesthesiology 2026;144(2):272 View
  12. Wu S, Xu C, Xue Z, Huang Y, Xu G, Cui Y, Ma J, Ma R, Xie C. Beyond structured knowledge: performance boundaries of ChatGPT in geological-hazard question answering and the need for human-in-the-loop oversight. Frontiers in Earth Science 2026;13 View
  13. Xin J, He X. Evaluating Large Language Models as Medical Consultation Tools for Double Eyelid Surgery: A Cross-Language Study in English and Chinese. Aesthetic Plastic Surgery 2026;50(5):1706 View
  14. El Natour D, Abou Alfa M, Chaaban A, Assi R, Dally T, Bou Dargham B. Performance of 5 AI Models on United States Medical Licensing Examination Step 1 Questions: Comparative Observational Study. JMIR AI 2026;5:e76928 View
  15. Yao Z, Zhao Y, Mitra A, Levy D, Druhl E, Tsai J, Yu H. SynthEHR-eviction: enhancing eviction SDoH detection with LLM-augmented synthetic EHR data. npj Digital Medicine 2026 View
  16. Suh P, Suh C. Do General-Purpose Multimodal Large Language Models Really See Radiologic Images or Rely on Text?. Korean Journal of Radiology 2026;27(4):297 View
  17. Ramsthaler F, Verhoff M. KI-basierte Bilder zu forensischen Demonstrationszwecken. Rechtsmedizin 2026;36(2):74 View
  18. Strasser L, Anschuetz W, Dennstädt F, Hastings J. Performance Evaluation of Large Language Models in Multilingual Medical Multiple-Choice Questions: Mixed Methods Study. JMIR Medical Education 2026;12:e81399 View
  19. Hack S, Craig J, Lin C, Fu C, Kwiatkowska M, Kocum P, Allevi F, Saibene A. Retrieval-augmented generative AI enhances clinical reasoning in odontogenic sinusitis versus maxillary sinus mucositis. European Archives of Oto-Rhino-Laryngology 2026 View
  20. Meshram H, Bhagat C, Puri S, Gadireddy S, Modasia B, Batheja V, Mathur R. Effect of large language model on diagnostic accuracy and clinical completeness among nephrology fellows managing transplant infection. International Urology and Nephrology 2026 View

Conference Proceedings

  1. Lv Y, Yu Q, Wang Z, Liang Y, Wang F, Li S. 2025 7th International Conference on Artificial Intelligence Technologies and Applications (ICAITA). CoupletEval: A Novel Benchmark for Assessing Chinese Linguistic Proficiency in Large Language Models View