Accessibility settings

Published on in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/69910, first published .
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study

Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study

Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study

Journals

  1. Cassim E, Prewitt M, Walsh D. Medical Apps for Physicians. Medical Clinics of North America 2026;110(2):237 View
  2. Ohu F, Burrell D, Jones L. Public Health Risk Management, Policy, and Ethical Imperatives in the Use of AI Tools for Mental Health Therapy. Healthcare 2025;13(21):2721 View
  3. Böke A, Hacker H, Chakraborty M, Baumeister-Lingens L, Vöckel J, Koenig J, Vogel D, Lichtenstein T, Vogeley K, Kambeitz-Ilankovic L, Kambeitz J. Observer-Independent Assessment of Content Overlap in Mental Health Questionnaires: Large Language Model–Based Study. JMIR AI 2025;4:e79868 View
  4. Voultsiou E, Moussiades L. A Systematic Review of Large Language Models in Mental Health: Opportunities, Challenges, and Future Directions. Electronics 2026;15(3):524 View
  5. Han B, Barnes T, Reddy C, Shin A. Evaluating Large Language Model–Generated Clinical Summaries Through a Dual-Perspective Framework: Retrospective Observational Study. JMIR AI 2026;5:e85221 View
  6. Polyzou M, Baraliakos X. Artificial Intelligence (AI) in rheumatology: a comparative evaluation of the ChatGPT and DeepSeek application. BMC Rheumatology 2026;10(1) View
  7. Güler I, Grieb G, Kraus A, Stelling H. Artificial Intelligence in Plastic Surgery Education: A Global Multimodel Benchmark of Large Language Models on the Plastic Surgery In-Service Training Examination. Aesthetic Surgery Journal Open Forum 2026;8 View
  8. Jiao R, Chen M, Zhang J. Assessing large language model responses to pediatric depression FAQs: a cross-sectional study on readability, accuracy, and sentiment. Frontiers in Psychiatry 2026;17 View

Conference Proceedings

  1. Perea del Olmo C, Coyle D. Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems. Generative AI in the Online Mental Health Information Ecosystem: Young Adults' Use and Perceptions View