Accessibility settings

Published on in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/84120, first published .
Knowledge-Practice Performance Gap in Clinical Large Language Models: Systematic Review of 39 Benchmarks

Knowledge-Practice Performance Gap in Clinical Large Language Models: Systematic Review of 39 Benchmarks

Knowledge-Practice Performance Gap in Clinical Large Language Models: Systematic Review of 39 Benchmarks

Journals

  1. Lin X, Yang Y, Ren Y. Making Chatbots more human: deep reasoning large language models in ophthalmology. Frontiers in Medicine 2026;12 View
  2. Spieser J, Balapour A, Meller J, Patra K, Shamsaei B. A Review of Multi-Agent AI Systems for Biological and Clinical Data Analysis. Methods and Protocols 2026;9(2):33 View
  3. Zhu Q, Li Q, Zan Y, Lu Y, Xia L, Xia Y, Xu T. Patient-centered gastrointestinal function assessment technologies: a paradigm shift from traditional approaches to non-invasive innovations. Frontiers in Physiology 2026;17 View
  4. Prause M. No skin in the game: why agentic AI requires principal-agent governance. AI and Ethics 2026;6(2) View
  5. Eltaybani S. Knowledge Cut‐Off in Large Language Models: Implications for Critical Care Nursing. Nursing in Critical Care 2026;31(3) View
  6. Lee W, Kim J, Leem J, Lee B, Lee S, Kim Y. Benchmark Evaluation of a Tool-Augmented Large Language Model Agent Using Traditional Asian Medicine Metadata. Applied Sciences 2026;16(7):3377 View
  7. Mine Y, Taji T, Okazaki S, Takeda S, Shimoe S, Kaku M, Nikawa H, Kakimoto N, Murayama T. Beyond exam accuracy: Tracking a persistent-failure set reveals visual dental reasoning gaps in multimodal LLMs. Journal of Dentistry 2026;170:106675 View
  8. Wang X, Yin C, He H, Guo J, Fu X, Bai F. Benchmarking public large language model responses to patient-facing inflammatory bowel disease questions: informational quality, transparency proxies, and readability. Frontiers in Public Health 2026;14 View
  9. Keshav T, Chow D, Kippenberger T, Livezey J, Aranda M. Evaluating Large-Language Models Against Providers on Surgical Diagnostic Reasoning Tasks. Journal of Surgical Research 2026;322:259 View
  10. Rajwal S, Pandey A, Zhang Z, Chen Y, Liu M, Das S, Rogers H, Sarker A, Xiao Y. Applications of Natural Language Processing and Large Language Models for Social Determinants of Health: Systematic Review. Journal of Medical Internet Research 2026;28:e83793 View
  11. Bajwa M, Hoyt R, Knight D, Haider M. The Performance of DeepSeek R1 and Gemini 3 in Complex Medical Scenarios: Comparative Study. JMIRx Med 2026;7:e76822 View

Conference Proceedings

  1. Kumar A, Joshi S, Sachdeva S. 2026 International Conference on Signal Processing and Electronics Design (ICSPED). JsonUtil: An Open-Source RESTful JSON-Based Dynamic Form Generation Framework validation with OpenEHR ORBDA Benchmarking Dataset View