Published on in Vol 26 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/58329, first published .
Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study

Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study

Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study

Journals

  1. Li J, Li Z, Jian F. Letter to the Editor of the Journal of Medical Systems: Regarding “Evaluation of the Performance of Three Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases”. Journal of Medical Systems 2025;49(1) View
  2. Adapa K, Rajan S, Venketa A, Mazur L. Reclaiming clinical time: The promise and challenges of ambient AI for oncology nurses in the Asia–Pacific region. Asia-Pacific Journal of Oncology Nursing 2025;12:100737 View
  3. Chow J, Li K. Large Language Models in Medical Chatbots: Opportunities, Challenges, and the Need to Address AI Risks. Information 2025;16(7):549 View
  4. Leung T, Coristine A, Benis A. AI Scribes in Health Care: Balancing Transformative Potential With Responsible Integration. JMIR Medical Informatics 2025;13:e80898 View
  5. Papageorgiou P, Christodoulou R, Pitsillos R, Petrou V, Vamvouras G, Kormentza E, Papagelopoulos P, Georgiou M. The Role of Large Language Models in Improving Diagnostic-Related Groups Assignment and Clinical Decision Support in Healthcare Systems: An Example from Radiology and Nuclear Medicine. Applied Sciences 2025;15(16):9005 View
  6. Hack S, Attal R, Locatelli G, Scotta G, Maniaci A, Parisi F, van der Poel N, Van Daele M, Garcia‐Lliberos A, Rodriguez‐Prado C, Chiesa‐Estomba C, Andueza‐Guembe M, Cobb P, Zalzal H, Saibene A. Surgeon, Trainee, or GPT? A Blinded Multicentric Study of AI‐Augmented Operative Notes. The Laryngoscope 2025 View
  7. Alves J, Azevedo R, Encarnação R, Marques A, Alves P. Artificial intelligence to support clinical judgement in nursing: a scoping review protocol. Athena Health & Research Journal 2025;2(3) View
  8. Du S, Huang Y, Yuan Q, Dai Y, Shi Z, Hu M. Rule-augmented LLM framework for detecting unreasonableness in ICU. Displays 2026;91:103196 View
  9. Dasa D, Board M, Rolfe U, Dolby T, Tang W. Evaluating AI-driven characters in extended reality (XR) healthcare simulations: A systematic review. Artificial Intelligence in Medicine 2025;170:103270 View
  10. Ming S, Yao X, Guo Q, Chen D, Guo X, Xie K, Lei B. Evaluation of DeepSeek-R1 for Ophthalmic Diagnosis and Reasoning: A Comparison with OpenAI o1 and o3. Journal of Medical Systems 2025;49(1) View
  11. Song J, Park J, Kim J, You S. Large Language Model Assistant for Emergency Department Discharge Documentation. JAMA Network Open 2025;8(10):e2538427 View
  12. Bai J, Ji X, Yu J, Wang Y, Guo Y, Xue C, Zhang W, Zhu J. From Patient Concerns to AI Responses: A Delphi-Based Quality Assessment for Axial Spondyloarthritis (Preprint). JMIR AI 2025 View
  13. Chang Q, Chen F, Chen Y, Cheng L, Dong D, Dong J, Feng X, Ge J, He J, He Y, He Z, Ji H, Jiang X, Jiang Z, Li N, Li P, Li Y, Liu B, Liu J, Lyu H, Min D, Qi W, Shen X, Sheng B, Sun J, Sun Y, Tian B, Wang K, Wang L, Wang L, Wang W, Wang Y, Wang Y, Wang Z, Weng J, Wei J, Wu G, Wu X, Xiao Y, Xu Y, Yan P, Ye Z, Yin W, Zhang C, Zhang D, Zhang P, Zhang W, Zhang X, Zhao S, Zhao Y, Zhou S, Zhou X, Zhu B, Zhu L, Zhu Z. 2025 Expert consensus on retrospective evaluation of large language model applications in clinical scenarios. Intelligent Medicine 2025 View
  14. Lim K, Kang U, Li X, Kim J, Jung Y, Park S, Kim B. Susceptibility of Large Language Models to User-Driven Factors in Medical Queries. Journal of Healthcare Informatics Research 2025 View

Conference Proceedings

  1. Srinivasan M, Abdel J. 2025 IEEE International Conference on Artificial Intelligence Testing (AITest). GenFair: Systematic Test Generation for Fairness Fault Detection in Large Language Models View