Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study

Li J, Li Z, Jian F. Letter to the Editor of the Journal of Medical Systems: Regarding “Evaluation of the Performance of Three Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases”. Journal of Medical Systems 2025;49(1) View
Adapa K, Rajan S, Venketa A, Mazur L. Reclaiming clinical time: The promise and challenges of ambient AI for oncology nurses in the Asia–Pacific region. Asia-Pacific Journal of Oncology Nursing 2025;12:100737 View
Chow J, Li K. Large Language Models in Medical Chatbots: Opportunities, Challenges, and the Need to Address AI Risks. Information 2025;16(7):549 View
Leung T, Coristine A, Benis A. AI Scribes in Health Care: Balancing Transformative Potential With Responsible Integration. JMIR Medical Informatics 2025;13:e80898 View
Papageorgiou P, Christodoulou R, Pitsillos R, Petrou V, Vamvouras G, Kormentza E, Papagelopoulos P, Georgiou M. The Role of Large Language Models in Improving Diagnostic-Related Groups Assignment and Clinical Decision Support in Healthcare Systems: An Example from Radiology and Nuclear Medicine. Applied Sciences 2025;15(16):9005 View
Hack S, Attal R, Locatelli G, Scotta G, Maniaci A, Parisi F, van der Poel N, Van Daele M, Garcia‐Lliberos A, Rodriguez‐Prado C, Chiesa‐Estomba C, Andueza‐Guembe M, Cobb P, Zalzal H, Saibene A. Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI‐Augmented Operative Notes. The Laryngoscope 2026;136(2):605 View
Alves J, Azevedo R, Encarnação R, Marques A, Alves P. Artificial intelligence to support clinical judgement in nursing: a scoping review protocol. Athena Health & Research Journal 2025;2(3) View
Du S, Huang Y, Yuan Q, Dai Y, Shi Z, Hu M. Rule-augmented LLM framework for detecting unreasonableness in ICU. Displays 2026;91:103196 View
Dasa D, Board M, Rolfe U, Dolby T, Tang W. Evaluating AI-driven characters in extended reality (XR) healthcare simulations: A systematic review. Artificial Intelligence in Medicine 2025;170:103270 View
Ming S, Yao X, Guo Q, Chen D, Guo X, Xie K, Lei B. Evaluation of DeepSeek-R1 for Ophthalmic Diagnosis and Reasoning: A Comparison with OpenAI o1 and o3. Journal of Medical Systems 2025;49(1) View
Song J, Park J, Kim J, You S. Large Language Model Assistant for Emergency Department Discharge Documentation. JAMA Network Open 2025;8(10):e2538427 View
Bai J, Ji X, Yu J, Wang Y, Guo Y, Xue C, Zhang W, Zhu J. Assessing the Quality of AI Responses to Patient Concerns About Axial Spondyloarthritis: Delphi-Based Evaluation. JMIR AI 2026;5:e79153 View
Chang Q, Chen F, Chen Y, Cheng L, Dong D, Dong J, Feng X, Ge J, He J, He Y, He Z, Ji H, Jiang X, Jiang Z, Li N, Li P, Li Y, Liu B, Liu J, Lyu H, Min D, Qi W, Shen X, Sheng B, Sun J, Sun Y, Tian B, Wang K, Wang L, Wang L, Wang W, Wang Y, Wang Y, Wang Z, Weng J, Wei J, Wu G, Wu X, Xiao Y, Xu Y, Yan P, Ye Z, Yin W, Zhang C, Zhang D, Zhang P, Zhang W, Zhang X, Zhao S, Zhao Y, Zhou S, Zhou X, Zhu B, Zhu L, Zhu Z. 2025 Expert consensus on retrospective evaluation of large language model applications in clinical scenarios. Intelligent Medicine 2025;5(4):318 View
Lim K, Kang U, Li X, Kim J, Jung Y, Park S, Kim B. Susceptibility of Large Language Models to User-Driven Factors in Medical Queries. Journal of Healthcare Informatics Research 2026;10(2):498 View
Hirata R, Oda Y, Morikawa S, Shigematsu K, Yamamoto D, Ito S, Tago M. On-Premises AI-Tool for Generating Nursing Care Summaries: A Phased-Implementation Study in Japan. Nursing: Research and Reviews 2025;Volume 15:215 View
Yost C, Jumreornvong O, Hasoon J, Ruan Q, Ang S, Palumbo P, Bianco G, Yong R, Zetter G, Christo P, Kaye A, Hilger H, Chung M, Duszynski B, Robinson C. Artificial Intelligence-Assisted Scribing for Chronic Pain Care: A Narrative Review. Current Pain and Headache Reports 2026;30(1) View
Jacobs M, Oosterhoff J, Agricola R, van der Weegen W. Large language models versus healthcare professionals in providing medical information to patient questions: A systematic review. International Journal of Medical Informatics 2026;209:106250 View
Yesha R, Orezzoli M, Sims K, Landau A. Digital Mental Health Through an Intersectional Lens: A Narrative Review. Healthcare 2026;14(2):211 View
Shang L, Chen Y, Li R, Zhang X, Gao M, Hou Y, Zhang G. Exploring the prognostic utility of large language models versus traditional clinical models in heart failure: a pilot study. International Journal of Surgery 2026;112(3):5778 View
Choi D, Seo J, Cha W, Kim M, Heo S, Chang H, Kim T. Automated chain-of-thought evaluation framework for large language model–generated emergency department documentation: a simulation-based study. Clinical and Experimental Emergency Medicine 2026;13(1):53 View
Sismanoglu S, Isik V, Kayahan M. Performance of large language models in endodontics: accuracy, consistency, and benchmarking with consensus guidelines. BMC Oral Health 2026;26(1) View
Wang H, Du W, Yang B, Liu M, Xu C, Zhang W, Xu C, He L, Zhang W, Yu Y, Lin J, Peng X. Evaluating open-source LLMs for dental EMR generation. BMC Oral Health 2026;26(1) View
Azar A, Mohasefi J, Wiil U, Naemi A, Ebrahimi A. Artificial intelligence language models for medical text analysis: A systematic review. Artificial Intelligence in Medicine 2026:103441 View
Peters S, Boyum J, Legler S, Heise K, Griffin A, Heaton H. Generative artificial intelligence for inpatient documentation summarization: mixed-methods quality assessment and early real-world experience. Journal of the American Medical Informatics Association 2026 View
Luo J, Wang Y, Zhao M, Yin J, Ding D, Wu X. Deficiencies in clinical reasoning of LLMs in low back pain management and remediation via prompt engineering: from performance evaluation to error diagnosis. Frontiers in Artificial Intelligence 2026;9 View
Ilyas M. importance and efficacy of advanced parametric and non-parametric tests commonly used in health sciences research. International journal of health sciences 2026;10(S1):207 View
Seo J, Kim T, Kim J. Assessing Eligibility for Anticancer Drug Health Insurance Reimbursement Using Large Language Models: Benchmark Development and Comparative Study. Journal of Medical Internet Research 2026;28:e95877 View
Lampignano J, Kale A, Kumar S, Shah D, Reddy B, Lowe D. Beyond Validation: Operationalising Post-Deployment Surveillance of AI Medical Devices in Clinical Practice. Journal of Medical Systems 2026;50(1) View
González Zazueta L, López Covarrubias B, Navarro Cota C, Vázquez Briseño M, Nieto Hipólito J, Romo Cárdenas G, Avilés Rodríguez G. Evaluation Frameworks for Clinical Foundation Models in Specific Tasks of Unstructured Medical Text Analysis: A Scoping Review. Healthcare 2026;14(13):1865 View
Karataş S, Öner S. Pre-deployment safety and governance assessment of LLM-based clinical decision support systems: A health technology assessment-oriented evaluation framework. Health Policy and Technology 2026;15(9):101281 View
Heyat M, Rehman A, Zeeshan H, Hayat M, Akhtar F, Sadaf , Ansari M, Wang L, Lai D, Prasath V, Gandhi T, Sawan M. Large Language Model: Future of Healthcare Research With Challenges. WIREs Data Mining and Knowledge Discovery 2026;16(3) View
Baek G, Lee H, Yang D, Kang M, Lee K, Choi M, Lee Y, Lee K. Teaching Model Context Protocol, Retrieval-Augmented Generation, and AI Agents to a Multidisciplinary Hospital Workforce: Single-Group Pre-Post Survey Study (Preprint). JMIR Medical Education 2026 View

Books/Policy Documents

Chandra J, Malviya M, Sabu S, Rajendran R, Joseph A. Challenges and Applications of Generative Large Language Models. View

Conference Proceedings

Srinivasan M, Abdel J. 2025 IEEE International Conference on Artificial Intelligence Testing (AITest). GenFair: Systematic Test Generation for Fairness Fault Detection in Large Language Models View

This paper is in the following e-collection/theme issue:

Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study

Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study

Journals

Books/Policy Documents

Conference Proceedings