Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

doi:10.2196/70901

Published on 26.May.2025 in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/70901, first published 05.Jan.2025.

Dr. AI robot wearing glasses and a lab coat, writing with a pen in a classroom.

Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

Luning Sun¹

; Christopher Gibbons²

; José Hernández-Orallo^{3, 4}

; Xiting Wang⁵

; Liming Jiang⁶

; David Stillwell¹

; Fang Luo⁶

; Xing Xie⁷

Article Authors Cited by (3) Tweetations Metrics

Journals

Yeh Y, Shih M, De Backer D, Celi L, See K, Fujii T, Ling L, Mongkolpun W, Hu H, Chen H, Chen W, Cholley B, Fong K, Ryu H, Na S, Egi M, Chan W, Chen K, Kamaleswaran R, Chuang Y, Yang C, Hsiao W, Lai S, Ku D, Jahan A, Martin G. The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation. Annals of Intensive Care 2026;16:100078 View
Liu J, Liu S, Wright A. Benchmark Integrity and Reasoning-Trace Errors in Medical Question Answering With Large Language Models: Mixed Methods Study With Sparse Autoencoders (Preprint). Journal of Medical Internet Research 2025 View
Wu Y, Wu S, Liu W, Yao X, Qin Z, Tao L, Zheng X, Xia D. Degradation grading of organic coatings via explainable vision–impedance multimodal zero-shot learning. Corrosion Science 2026;269:113980 View

Citation

Please cite as:

Sun L, Gibbons C, Hernández-Orallo J, Wang X, Jiang L, Stillwell D, Luo F, Xie X
Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics
J Med Internet Res 2025;27:e70901
doi: 10.2196/70901 PMID: 40418851 PMCID: 12129431

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Viewpoints and Perspectives (1364) Development and Evaluation of Research Methods, Instruments and Tools (1287) Artificial Intelligence (4624) Applications of AI (892)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn