Published on in Vol 26 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/58158, first published .
Evaluating and Enhancing Large Language Models’ Performance in Domain-Specific Medicine: Development and Usability Study With DocOA

Evaluating and Enhancing Large Language Models’ Performance in Domain-Specific Medicine: Development and Usability Study With DocOA

Evaluating and Enhancing Large Language Models’ Performance in Domain-Specific Medicine: Development and Usability Study With DocOA

Journals

  1. Woo J, Yang A, Olsen R, Hasan S, Nawabi D, Nwachukwu B, Williams R, Ramkumar P. Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence-Based Medicine. Arthroscopy: The Journal of Arthroscopic & Related Surgery 2025;41(3):565 View
  2. Yang X, Li T, Su Q, Liu Y, Kang C, Lyu Y, Zhao L, Nie Y, Pan Y. Application of large language models in disease diagnosis and treatment. Chinese Medical Journal 2025;138(2):130 View
  3. Liu S, McCoy A, Wright A. Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines. Journal of the American Medical Informatics Association 2025;32(4):605 View
  4. Al Zo’ubi M. Review of 2024 publications on the applications of artificial intelligence in rheumatology. Clinical Rheumatology 2025;44(4):1427 View
  5. Chen X, Yi H, You M, Liu W, Wang L, Li H, Zhang X, Guo Y, Fan L, Chen G, Lao Q, Fu W, Li K, Li J. Enhancing diagnostic capability with multi-agents conversational large language models. npj Digital Medicine 2025;8(1) View
  6. Tait K, Cronin J, Wiper O, Wallis J, Davies J, Dürichen R. ArcTEX—a novel clinical data enrichment pipeline to support real-world evidence oncology studies. Frontiers in Digital Health 2025;7 View
  7. Kim M, Hwang G, Chang J, Chang S, Roh H, Park R. Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations. Journal of Medical Internet Research 2025;27:e69857 View
  8. Amugongo L, Mascheroni P, Brooks S, Doering S, Seidel J, Liu X. Retrieval augmented generation for large language models in healthcare: A systematic review. PLOS Digital Health 2025;4(6):e0000877 View
  9. Qiang S, Zhang H, Liao Y, Zhang Y, Gu Y, Wang Y, Xu Z, Shi H, Han N, Yu H. Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study. Journal of Medical Internet Research 2025;27:e73226 View
  10. Qiu L, Tang C, Bi X, Burtch G, Chen Y, Zhang H. Physician Use of Large Language Models: A Quantitative Study Based on Large-Scale Query-Level Data. Journal of Medical Internet Research 2025;27:e76941 View
  11. Kang H, Li J, Hou L, Xu X, Zheng S, Li Q. Large Language Model–Enhanced Drug Repositioning Knowledge Extraction via Long Chain-of-Thought: Development and Evaluation Study. JMIR Medical Informatics 2025;13:e77837 View
  12. Lanzieri N, Dempsey A, Olsen A, Samelson H, Plass J. Creating an AI Powered VR Simulation Platform for Social Work Skill Development. Journal of Technology in Human Services 2025:1 View
  13. Darnell S, Overall R, Guarracino A, Colonna V, Villani F, Garrison E, Isaac A, Muli P, Muriithi F, Kabui A, Kilyungi M, Lisso F, Kibet A, Muhia B, Nijveen H, de Ligt J, Yousefi S, Ashbrook D, Huang P, Suh G, Umar M, Batten C, Chen H, Sen S, Williams R, Prins P. Creating a biomedical knowledge base by addressing GPT inaccurate responses and benchmarking context. Open Research Africa 2025;8:12 View
  14. Bai J, Ji X, Yu J, Wang Y, Guo Y, Xue C, Zhang W, Zhu J. From Patient Concerns to AI Responses: A Delphi-Based Quality Assessment for Axial Spondyloarthritis (Preprint). JMIR AI 2025 View
  15. Chang Q, Chen F, Chen Y, Cheng L, Dong D, Dong J, Feng X, Ge J, He J, He Y, He Z, Ji H, Jiang X, Jiang Z, Li N, Li P, Li Y, Liu B, Liu J, Lyu H, Min D, Qi W, Shen X, Sheng B, Sun J, Sun Y, Tian B, Wang K, Wang L, Wang L, Wang W, Wang Y, Wang Y, Wang Z, Weng J, Wei J, Wu G, Wu X, Xiao Y, Xu Y, Yan P, Ye Z, Yin W, Zhang C, Zhang D, Zhang P, Zhang W, Zhang X, Zhao S, Zhao Y, Zhou S, Zhou X, Zhu B, Zhu L, Zhu Z. 2025 Expert consensus on retrospective evaluation of large language model applications in clinical scenarios. Intelligent Medicine 2025 View
  16. Yang S, Jing M, Wang S, Huang Z, Wang J, Kou J, Shi M, Xia Z, Wei Q, Xing W, Hu Y, Zhu Z. Building trustworthy large language model-driven generative recommender system for healthcare decision support: A scoping review of corpus sources, customization techniques, and evaluation frameworks. Artificial Intelligence in Medicine 2026;171:103310 View
  17. Pohlmann P, Glienke M, Sandkamp R, Gratzke C, Schmal H, Schoeb D, Fuchs A. Assessing the Efficacy of Ortho GPT: A Comparative Study with Medical Students and General LLMs on Orthopedic Examination Questions. Bioengineering 2025;12(12):1290 View