Accessibility settings

Published on in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/70901, first published .
Dr. AI robot wearing glasses and a lab coat, writing with a pen in a classroom.

Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

Journals

  1. Yeh Y, Shih M, De Backer D, Celi L, See K, Fujii T, Ling L, Mongkolpun W, Hu H, Chen H, Chen W, Cholley B, Fong K, Ryu H, Na S, Egi M, Chan W, Chen K, Kamaleswaran R, Chuang Y, Yang C, Hsiao W, Lai S, Ku D, Jahan A, Martin G. The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation. Annals of Intensive Care 2026;16:100078 View
  2. Liu J, Liu S, Wright A. Benchmark Integrity and Reasoning-Trace Errors in Medical Question Answering With Large Language Models: Mixed Methods Study With Sparse Autoencoders (Preprint). Journal of Medical Internet Research 2025 View