Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers

doi:10.2196/64452

Published on 14.Jul.2025 in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/64452, first published 17.Jul.2024.

Doctor and robot with clipboards in hospital hallway, representing AI in healthcare.

Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers

Eden Avnat^{1, 2}

; Michal Levy^{3, 4}

; Daniel Herstain¹

; Elia Yanko⁵

; Daniel Ben Joya^{2, 6}

; Michal Tzuchman Katz²

; Dafna Eshel²

; Sahar Laros^{1, 2}

; Yael Dagan^{1, 2}

; Shahar Barami^{1, 2}

; Joseph Mermelstein²

; Shahar Ovadia²

; Noam Shomron¹

; Varda Shalev¹

; Raja-Elie E Abdulnour⁷

Article Authors Cited by (4) Tweetations (2) Metrics

Eden Avnat ^{1, 2} , MPH, MD ; Michal Levy ^{3, 4} , BCS, MD ; Daniel Herstain ¹ , MD ; Elia Yanko ⁵ , BSc ; Daniel Ben Joya ^{2, 6} , MD ; Michal Tzuchman Katz ² , MD ; Dafna Eshel ² , MD ; Sahar Laros ^{1, 2} , BMedSci ; Yael Dagan ^{1, 2} , BMedSci ; Shahar Barami ^{1, 2} , BMedSci ; Joseph Mermelstein ² , BCS ; Shahar Ovadia ² , MCS ; Noam Shomron ¹ , PhD ; Varda Shalev ¹ , MD, MPH ; Raja-Elie E Abdulnour ⁷ , MD

¹ Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel

² Kahun Medical Ltd, Givatayim, Israel

³ Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel

⁴ School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel

⁵ The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel

⁶ Kaplan Medical Center, Rehovot, Israel

⁷ Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States

Corresponding Author:

Eden Avnat, MPH, MD
Faculty of Medicine
Tel Aviv University
Chaim Levanon St 55
Tel Aviv 6997801
Israel
Phone: 972 545299622
Email: edenavnat@mail.tau.ac.il

Citation

Please cite as:

Avnat E, Levy M, Herstain D, Yanko E, Ben Joya D, Tzuchman Katz M, Eshel D, Laros S, Dagan Y, Barami S, Mermelstein J, Ovadia S, Shomron N, Shalev V, Abdulnour REE
Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers
J Med Internet Res 2025;27:e64452
doi: 10.2196/64452 PMID: 40658983 PMCID: 12279315

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Artificial Intelligence (4609) Research Instruments, Questionnaires, and Tools (1177) Generative Language Models Including ChatGPT (1446) AI Language Models in Health Care (711) Foundation Models and Their Applications in AI (104)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn