Accessibility settings

Published on in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/64452, first published .
Doctor and robot with clipboards in hospital hallway, representing AI in healthcare.

Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers

Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers

Journals

  1. Gong E, Bang C, Lee J, Baik G. Knowledge-Practice Performance Gap in Clinical Large Language Models: Systematic Review of 39 Benchmarks. Journal of Medical Internet Research 2025;27:e84120 View
  2. Gély L, Chaillot M, Fréour T. Can large language models provide accurate and empathetic answers to the most frequently asked questions by infertile patients? A pilot study. Reproductive BioMedicine Online 2026;52(4):105221 View
  3. Martini S, Schluessel S, Aghamaliyev U, Rippl M, Deissler L, Tausendfreund O, Nuebler D, Mueller K, Schmidmaier R, Drey M. Expert Evaluation of the Perceived Accuracy, Relevance, and Safety of Large Language Model–Generated Patient Information in Geriatrics: Cross-Condition Study. JMIR AI 2026;5:e91369 View