Evaluating and Enhancing Large Language Models’ Performance in Domain-Specific Medicine: Development and Usability Study With DocOA

Woo J, Yang A, Olsen R, Hasan S, Nawabi D, Nwachukwu B, Williams R, Ramkumar P. Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence‐Based Medicine. Arthroscopy 2025;41(3):565 View
Yang X, Li T, Su Q, Liu Y, Kang C, Lyu Y, Zhao L, Nie Y, Pan Y. Application of large language models in disease diagnosis and treatment. Chinese Medical Journal 2025;138(2):130 View
Liu S, McCoy A, Wright A. Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines. Journal of the American Medical Informatics Association 2025;32(4):605 View
Al Zo’ubi M. Review of 2024 publications on the applications of artificial intelligence in rheumatology. Clinical Rheumatology 2025;44(4):1427 View
Chen X, Yi H, You M, Liu W, Wang L, Li H, Zhang X, Guo Y, Fan L, Chen G, Lao Q, Fu W, Li K, Li J. Enhancing diagnostic capability with multi-agents conversational large language models. npj Digital Medicine 2025;8(1) View
Tait K, Cronin J, Wiper O, Wallis J, Davies J, Dürichen R. ArcTEX—a novel clinical data enrichment pipeline to support real-world evidence oncology studies. Frontiers in Digital Health 2025;7 View
Kim M, Hwang G, Chang J, Chang S, Roh H, Park R. Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations. Journal of Medical Internet Research 2025;27:e69857 View
Amugongo L, Mascheroni P, Brooks S, Doering S, Seidel J, Liu X. Retrieval augmented generation for large language models in healthcare: A systematic review. PLOS Digital Health 2025;4(6):e0000877 View
Qiang S, Zhang H, Liao Y, Zhang Y, Gu Y, Wang Y, Xu Z, Shi H, Han N, Yu H. Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study. Journal of Medical Internet Research 2025;27:e73226 View
Qiu L, Tang C, Bi X, Burtch G, Chen Y, Zhang H. Physician Use of Large Language Models: A Quantitative Study Based on Large-Scale Query-Level Data. Journal of Medical Internet Research 2025;27:e76941 View
Kang H, Li J, Hou L, Xu X, Zheng S, Li Q. Large Language Model–Enhanced Drug Repositioning Knowledge Extraction via Long Chain-of-Thought: Development and Evaluation Study. JMIR Medical Informatics 2025;13:e77837 View
Lanzieri N, Dempsey A, Olsen A, Samelson H, Plass J. Creating an AI Powered VR Simulation Platform for Social Work Skill Development. Journal of Technology in Human Services 2025:1 View
Darnell S, Overall R, Guarracino A, Colonna V, Villani F, Garrison E, Isaac A, Muli P, Muriithi F, Kabui A, Kilyungi M, Lisso F, Kibet A, Muhia B, Nijveen H, de Ligt J, Yousefi S, Ashbrook D, Huang P, Suh G, Umar M, Batten C, Chen H, Sen S, Williams R, Prins P. Creating a biomedical knowledge base by addressing GPT inaccurate responses and benchmarking context. Open Research Africa 2025;8:12 View
Bai J, Ji X, Yu J, Wang Y, Guo Y, Xue C, Zhang W, Zhu J. Assessing the Quality of AI Responses to Patient Concerns About Axial Spondyloarthritis: Delphi-Based Evaluation. JMIR AI 2026;5:e79153 View
Chang Q, Chen F, Chen Y, Cheng L, Dong D, Dong J, Feng X, Ge J, He J, He Y, He Z, Ji H, Jiang X, Jiang Z, Li N, Li P, Li Y, Liu B, Liu J, Lyu H, Min D, Qi W, Shen X, Sheng B, Sun J, Sun Y, Tian B, Wang K, Wang L, Wang L, Wang W, Wang Y, Wang Y, Wang Z, Weng J, Wei J, Wu G, Wu X, Xiao Y, Xu Y, Yan P, Ye Z, Yin W, Zhang C, Zhang D, Zhang P, Zhang W, Zhang X, Zhao S, Zhao Y, Zhou S, Zhou X, Zhu B, Zhu L, Zhu Z. 2025 Expert consensus on retrospective evaluation of large language model applications in clinical scenarios. Intelligent Medicine 2025;5(4):318 View
Yang S, Jing M, Wang S, Huang Z, Wang J, Kou J, Shi M, Xia Z, Wei Q, Xing W, Hu Y, Zhu Z. Building trustworthy large language model-driven generative recommender system for healthcare decision support: A scoping review of corpus sources, customization techniques, and evaluation frameworks. Artificial Intelligence in Medicine 2026;171:103310 View
Pohlmann P, Glienke M, Sandkamp R, Gratzke C, Schmal H, Schoeb D, Fuchs A. Assessing the Efficacy of Ortho GPT: A Comparative Study with Medical Students and General LLMs on Orthopedic Examination Questions. Bioengineering 2025;12(12):1290 View
Zhang Z, Momeni Nezhad M, Bagher Hosseini S, Zolnour A, Zonour Z, Hosseini S, Topaz M, Zolnoori M. Advancing healthcare with large language models: A scoping review of applications and future directions. International Journal of Medical Informatics 2026;208:106231 View
Lu J, Huang J, Guo Y, Wu Q, Jiang Z, Yang T, Bian J, Bo L. Comparative analysis of six large language models in perioperative decision support for geriatric patients with multimorbidity: a three-dimensional evaluation framework. BMC Anesthesiology 2026;26(1) View
Wang Y, Luan Y, Cheng S, Hao M, Tan W, Hu R, Yao Z, Wang J, Wu J. A multi-layer retrieval-augmented large language model framework for enhancing hypertension education. Hypertension Research 2026;49(4):1428 View
Wang Q, Zou H, Zhang H, Huang Y, Tian J, Cheng W. A Survey on Medical Competence Evaluation Benchmarks for Large Language Models. Health Care Science 2026;5(1):4 View
Imran M, Lee Y. Multimodal Vision–Language Models in Medical Imaging: A Survey of Retrieval, Interpretability, and Trust. IEEE Access 2026;14:19511 View
Es’haghi A, Aliyariparand M, Jamalipour Soufi K, Aghaei H. Accuracy and completeness of large language models in Epidemic keratoconjunctivitis Queries: A Comparative study. International Journal of Medical Informatics 2026;213:106363 View
Fairley J, Kapoor M, Sharma D. Generative artificial intelligence in osteoarthritis: A systematic scoping review of current applications and future directions. Osteoarthritis and Cartilage 2026;34(6):784 View
Chen X, Zhou H, Yi H, You M, Liu W, Wang L, Qin Z, Li H, Zhang X, Guo Y, Li S, Hu Y, Xiong Q, Li R, Fan L, Lao Q, Fu W, Li J, Li K. Grounding large language models in clinical diagnostics. Nature Communications 2026;17(1) View
Karakoyun Z, Yörük M, Özdemir M, Koşar M. Evaluating the clinical decision-making performance of large language models in clinically oriented thoracic anatomy scenarios: a comparative evaluation study. BMC Medical Education 2026;26(1) View
Ren K, Weng Q, Chen Q, Li H, Xie D, Zeng C, Wei J, Lei G, Wang Y. The application of large language models in orthopedic postgraduate education: potentials, challenges, and future prospects. Journal of Orthopaedic Surgery and Research 2026;21(1) View
Jalili J, Gavhane Y, Walker E, Heinke A, Bowd C, Belghith A, Fazio M, Girkin C, De Moraes C, Liebmann J, Baxter S, Weinreb R, Zangwill L, Christopher M. Image-Quality–Aware Multimodal Artificial Intelligence for Automated Structured OCT Report Generation in Glaucoma Evaluation. Ophthalmology Science 2026;6(8):101254 View
He K, Xiao Q, Chen W, Jing L, Wang Y, Li S, Yang D, Xu H, Pang K, Xiao R, Liu Z, Zhuoga D, Chen R, Li J, Chang L, Zhou Y, Zhang Z, Li R, Ying L, Li R, Wang H, Yin X, Zhen G, Cai S, Shan Q, Wang Q, Zhuoga D, Yangjin C, Luobu G, Ji T, Wu D. Authoritative Textbook-Augmented Large Language Models for High-Altitude Public Health Medical Education in the Xizang Autonomous Region: Cross-Sectional Comparative Evaluation Study. Journal of Medical Internet Research 2026;28:e92852 View
Wilk E, Taluri S, Howton T, Crumley A, Mrug M, Lasseigne B. AI in variant analysis: fast track to genetic diagnoses. Human Genetics 2026;145(1) View

Books/Policy Documents

Lima S, Araújo D. Intelligent Systems. View
Saafi Y, Omheni N, Mannai Z, Mahmoudi R. Artificial Intelligence in Medicine. View

Conference Proceedings

Sun B. Proceedings of the 2025 4th International Conference on Health Big Data and Intelligent Healthcare. Building a Robust Disease Prediction Model for Medical Data: A Hybrid Approach Integrating Data Statistics and Artificial Intelligence View

This paper is in the following e-collection/theme issue:

Evaluating and Enhancing Large Language Models’ Performance in Domain-Specific Medicine: Development and Usability Study With DocOA

Evaluating and Enhancing Large Language Models’ Performance in Domain-Specific Medicine: Development and Usability Study With DocOA

Journals

Books/Policy Documents

Conference Proceedings