Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses

Siemens W, von Elm E, Binder H, Böhringer D, Eisele-Metzger A, Gartlehner G, Hanegraaf P, Metzendorf M, Mosselman J, Nowak A, Qureshi R, Thomas J, Waffenschmidt S, Labonté V, Meerpohl J. Opportunities, challenges and risks of using artificial intelligence for evidence synthesis. BMJ Evidence-Based Medicine 2025;30(6):381 View
Levkovich I. Evaluating Diagnostic Accuracy and Treatment Efficacy in Mental Health: A Comparative Analysis of Large Language Model Tools and Mental Health Professionals. European Journal of Investigation in Health, Psychology and Education 2025;15(1):9 View
Oami T, Okada Y, Nakada T. GPT-3.5 Turbo and GPT-4 Turbo in Title and Abstract Screening for Systematic Reviews. JMIR Medical Informatics 2025;13:e64682 View
Sanghera R, Thirunavukarasu A, El Khoury M, O’Logbon J, Chen Y, Watt A, Mahmood M, Butt H, Nishimura G, Soltan A. High-performance automated abstract screening with large language model ensembles. Journal of the American Medical Informatics Association 2025;32(5):893 View
Matthay E, Neill D, Titus A, Desai S, Troxel A, Cerdá M, Díaz I, Santacatterina M, Thorpe L. Integrating Artificial Intelligence into Causal Research in Epidemiology. Current Epidemiology Reports 2025;12(1) View
Luo X, Ge L, Zhang L, Chen Y, Du L. AI‐Empowered Evidence‐Based Research and Clinical Decision‐Making. Journal of Evidence-Based Medicine 2025;18(1) View
Lu J, Choi K, Eremeev M, Gobburu J, Goswami S, Liu Q, Mo G, Musante C, Shahin M. Large Language Models and Their Applications in Drug Discovery and Development: A Primer. Clinical and Translational Science 2025;18(4) View
Boyle A, Huo B, Sylla P, Calabrese E, Kumar S, Slater B, Walsh D, Vosburg R. Large language model-generated clinical practice guideline for appendicitis. Surgical Endoscopy 2025;39(6):3539 View
Luo X, Tham Y, Giuffrè M, Ranisch R, Daher M, Lam K, Eriksen A, Hsu C, Ozaki A, Moraes F, Khanna S, Su K, Begagić E, Bian Z, Chen Y, Estill J. Reporting guideline for the use of Generative Artificial intelligence tools in MEdical Research: the GAMER Statement. BMJ Evidence-Based Medicine 2025;30(6):390 View
Trad F, Yammine R, Charafeddine J, Chakhtoura M, Rahme M, El-Hajj Fuleihan G, Chehab A. Streamlining systematic reviews with large language models using prompt engineering and retrieval augmented generation. BMC Medical Research Methodology 2025;25(1) View
Xie Z. Assessing the impact of chatbots on health decision-making: A multifactorial experimental approach. Technology and Health Care 2025;33(5):2266 View
Wang B, Luo X, Zhang J, Shi Q, Lai H, Liu H, Yao Y, Ge L, Liu J, Li H, Ma Y, Zhang L, Bian Z, Florez I, Chen Y, Estill J. Evaluating the Quality of Guidelines Using the AGREE II Tool by a Large Language Model vs Human Appraisers. JAMA Network Open 2025;8(5):e2512621 View
Kawakita T, Wong M, Gibson K, Gupta M, Gimovsky A, Moussa H, Hye H. Application of Generative AI to Enhance Obstetrics and Gynecology Research. American Journal of Perinatology 2025;42(16):2094 View
Luo X, Chen Y. Transparent reporting of generative artificial intelligence use in systematic reviews. Journal of the American Academy of Dermatology 2025;93(3):e117 View
Adel A, Alani N. Can generative AI reliably synthesise literature? exploring hallucination issues in ChatGPT. AI & SOCIETY 2025;40(8):6799 View
Ke L, Tong S, Cheng P, Peng K. Exploring the frontiers of LLMs in psychological applications: a comprehensive review. Artificial Intelligence Review 2025;58(10) View
Strohl H, Do N, Irene Su H. Current and emerging data sources for assessment of ovarian toxicity in children, adolescents and young adults with cancer. Best Practice & Research Clinical Obstetrics & Gynaecology 2025;102:102640 View
Forero D, Abreu S, Tovar B, Oermann M. Automated analyses of risk of bias and critical appraisal of systematic reviews (ROBIS and AMSTAR 2): a comparison of the performance of 4 large language models. Journal of the American Medical Informatics Association 2025;32(9):1471 View
Bayani A, Epoh Ewane L, Oliveira dos Anjos D, Mac-Seing M, Nikiema J. Leveraging open-source large language models (LLMs) in scoping reviews: a case study on disability and AI applications. International Journal of Medical Informatics 2025;204:106048 View
Wang Z, Cao L, Danek B, Jin Q, Lu Z, Sun J. Accelerating clinical evidence synthesis with large language models. npj Digital Medicine 2025;8(1) View
Tran H, Thu A, Twayana A, Fuertes A, Gonzalez M, Basta M, James M, Frishman W, Aronow W. The Role of Generative Artificial Intelligence and Large Language Models in Atrial Fibrillation: Clinical Research and Decision Support. Cardiology in Review 2025 View
Kataoka Y, Takayama T, Yoshimura K, So R, Tsujimoto Y, Yamagishi Y, Takagi S, Furukawa Y, Sakata M, Bašić Đ, Cipriani A, Cuijpers P, Karyotaki E, Harrer M, Leucht S, Homiar A, Ostinelli E, Miguel C, Rodolico A, Furukawa T. Automating the data extraction process for systematic reviews using GPT-4o and o3. Research Synthesis Methods 2026;17(1):42 View
Wang Z, Cao L, Jin Q, Chan J, Wan N, Afzali B, Cho H, Choi C, Emamverdi M, Gill M, Kim S, Li Y, Liu Y, Luo Y, Ong H, Rousseau J, Sheikh I, Wei J, Xu Z, Zallek C, Kim K, Peng Y, Lu Z, Sun J. A foundation model for human-AI collaboration in medical literature mining. Nature Communications 2025;16(1) View
Shen J, Luo Z, Jia D, Wang S, Sun F, Wu J. Large language model enhanced framework for systematic reviews and meta-analyses. BMJ Digital Health & AI 2025;1(1):e000017 View
Vera V, Khandelwal V, Roy K, Garimella R, Surana H, Sheth A. AI‐Augmented Search for Systematic Reviews: A Comparative Analysis. Proceedings of the Association for Information Science and Technology 2025;62(1):705 View
Dun C, Couperus C, Lee S, Barnett R, Lee H, Xiong M, Wang Y, Wang Q, Lehmann H. Evaluation of Large Language Model Performance in Assessing Health Economic Study Quality. Journal of Health Economics and Outcomes Research 2025:154 View
Cheng P, Hu F, Chen L, Liu J, Wu J, Chen W. Generative artificial intelligence in ophthalmology research writing: A comprehensive review of applications, detection tools, and ethical considerations. Taiwan Journal of Ophthalmology 2026;16(1):68 View
Dun C, Couperus C, Lee S, Barrett R, Lee H, Xiong M, Wang Y, Wang Q, Lehmann H. Evaluation of Large Language Model Performance in Assessing Health Economic Study Quality. Journal of Health Economics and Outcomes Research 2025;12(2) View
Liang Y, Yang C, Chen C, Lin Y, Lin S, Wang Y, Yang H. SMART 2.0 Statistical Metabolomics Analysis: An R Tool 2.0. Analytical Chemistry 2025;97(46):25453 View
Zhou Y, Hu H. Can AI assess literature like experts? An entropy-based comparison of ChatGPT-4o, DeepSeek R1, and human ratings. Frontiers in Research Metrics and Analytics 2025;10 View
Herman M, Qazi S, Farrell E, Song J, Cecchini M, Mirza K, Bui M, Hacking S. The AI-powered pathologist: A global survey mapping initial trends in AI adoption and outlook. Journal of Pathology Informatics 2025;19:100526 View
Caponio V, Lorenzo-Pouso A, Magalhaes M, Ali A, Adamo D, Cirillo N, López-Pintor R, Musella G. Accuracy of LLMs to retrieve numeric data for meta-analysis in dentistry. Journal of Dentistry 2026;164:106245 View
Li D, Jiang N, Huang K, Tu R, Ouyang S, Yu H, Qiao L, Yu C, Zhou T, Tong D, Wang Q, Li M, Zeng X, Tian Y, Tian X, Li J. Streamlining evidence based clinical recommendations with large language models. npj Digital Medicine 2025;8(1) View
Li L, Mathrani A, Susnjak T. Transforming evidence synthesis: A systematic review of the evolution of automated meta-analysis in the age of AI. Research Synthesis Methods 2026;17(3):403 View
Mahomed S, Hanscheid T, Almeida Y, Grobusch M, Rebelo M. Improving reproducibility of Plasmodium falciparum culture: a large-language-model-driven literature review and a parasite growth variability assessment of donor red blood cells. Malaria Journal 2025;25(1) View
Li L, Mathrani A, Susnjak T. What level of automation is “good enough”? A benchmark of large language models for meta-analysis data extraction. Research Synthesis Methods 2026;17(4):671 View
. Artificial intelligence-facilitated evidence-based medicine research and practice: A position statement. Chinese Medical Journal 2026 View
Park H, Shin D, Kim N. Quality Evaluation of Generative AI–Based Search Strategies in Systematic Reviews and Comparison of Search Performance with Human Expert (Medical Librarian). Journal of Korean Medical Library Association 2025;52(1):28 View
Park Y, Zhang H, Bai S. Large language models in systematic review and meta-analysis of surgical treatments for vaginal vault prolapse. npj Digital Medicine 2026;9(1) View
Ponon N, Basani M. Integrating AI and Large Language Models for Automated Data Quality Enhancement in Data Integration Systems. IEEE Open Journal of the Computer Society 2026;7:528 View
Nagao T, Kawakita T. Assessing the Reliability of Large Language Models for Evaluation of Risk of Bias in Randomized Clinical Trials. American Journal of Perinatology 2026 View
Speth K, Sengupta S, Shamsuzzaman M, Whitmore J, Li D, Chiang A, Xu Z, Song Q. Advancing cell and gene therapy: Application of AI/ML in clinical development and patient management. Journal of Biopharmaceutical Statistics 2026:1 View
Pei B, Sun X, Guo R, Zhang Z. LitPilot: A Human-Centered Platform for Transparent and Interactive Systematic Literature Reviews. International Journal of Human–Computer Interaction 2026:1 View
Vidor P, Casiraghi Y, de Souza A, Schmidt M. Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study. Scientific Reports 2026;16(1) View
Chawla R, Maheshwari A, Modi A, Hirani H, Samrat S, Dey A, Parikh R, Saboo B, Chawla M, Gupta A, Jaggi S, Gupta S, Agarwal S, Chawla P, Saboo B, Kesavadev J, Seshadri K, Singh A, Viswanathan V, Mohan V, Sahay R, Hassanein M, Gupta A, Dube S, Dutta D, Mathur S, Dhruv U, Kapoor N, Agarwal A, Bhadada S, Vidyasagar S, Singh A, Kumar A, Rastogi A, AG U, Bhandari S, Singh D, Garg A, Srivastava S, Jethwani P, Chowdhury A, Goyal B, K S, Nath S, Purwar N, Sharma A, Mirshad R, Sahoo J, Agarwal K, Hasnani D, Chavda V, Thakker R, Saxena P, Prabhu M, Sinha S, Saxena A, Gupta R. RSSDI IJDDC position statement on embracing artificial intelligence responsibly in medical research and scientific publication: consensus recommendations for authors, editors, and peer reviewers. International Journal of Diabetes in Developing Countries 2026;46(2):363 View
Yao Y, Liu H, Yang D, Luo X, Lai H, Wang Z, Chen Y, Bian Z. Integration of large language models and evidence-based Chinese medicine: A scoping review. Integrative Medicine Research 2026;15(3):101349 View
Budau L, Finney R, Ensan F. Empowering open medium-sized generative language models for effective structured search in biomedical systematic reviews. International Journal of Medical Informatics 2026;216:106463 View
Liu H, Xu K, Zhang J, Wu S, Qin Y, Ma Y, Yu X, Zhang H, Li H, Wu M, Wang Z, Luo X, Wang B, Yao Y, Feng Y, Sun L, Dong M, Hong Y, Liu J, Yang R, Hu Y, Lai H, Zhou Q, Li X, Ge L, Chen Y, Bian Z. Design and methodology of the AI-empowered Clinical Evidence for Integrated Chinese-Western Medicine (ACE-iMed) platform. Integrative Medicine Research 2026;15(3):101351 View
Carrasco L, Engvall T, Dobslaw F. A Conceptual Crosswalk Between Trustworthy Records and Trustworthy AI. Philosophy & Technology 2026;39(2) View
Nawrath M, Merlina A, Knight J, Welch S, Rashidian M, Seifert-Dähnn I. Validating Large Language Models for Title-Abstract Screening in Low-Prevalence Systematic Reviews: An Environmental Science Case Study. Information 2026;17(5):501 View
Brini S, Leung T. Value and Credibility of Meta-Analysis: Tutorial on Enhancing Methodological Rigor and AI-Powered Efficiency. Journal of Medical Internet Research 2026;28:e92132 View
Rosenkrantz A. Author Disclosure of Use of Generative Artificial Intelligence Technologies in Submissions to AJR: Reflections From the AJR Editor in Chief. American Journal of Roentgenology 2026 View
Zhai J, Aryadoust V. Synthesizing eye-tracking research in L2 listening assessment: a human-AI collaborative systematic review. Innovation in Language Learning and Teaching 2026:1 View

Conference Proceedings

Gehrmann J, Quakulinski L, Beyan O. 2024 2nd International Conference on Foundation and Large Language Models (FLLM). Large Language Models for Literature Reviews - An Exemplary Comparison of LLM-based Approaches with Manual Methods View
Silva N, Wickramaarachchi D. 2025 5th International Conference on Advanced Research in Computing (ICARC). Enhancing Systematic Literature Reviews: Evaluating the Performance of LLM-Based Tools Across Key Systematic Literature Review Stages View
S.P.A.A F, Wickramaarachchi D. 2025 5th International Conference on Advanced Research in Computing (ICARC). Large Language Model (LLM) Support for Preliminary Consultation in Healthcare View
Köhler J, Harl M, Westner M, Strahringer S. 2025 27th International Conference on Business Informatics (CBI). Can AI be a Scholar? A Systematic Review on the Role of Generative AI in Systematic Literature Reviews View
Hu J, Yao X, Huang Z, Liang D, Yang D, Liu D, Li J, Zhang Y, Ma X. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems. From Human Pragmatic Language Skills to Conversational Agent Design: A Systematic Review of Transfer Strategies View
Manoj S, Bajpai P. 2026 IEEE Conference on Technologies for Sustainability (SusTech). Irrigation Scheduling Dataset Extraction Using LLM-Mined Research Evidence View

This paper is in the following e-collection/theme issue:

Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses

Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses

Journals

Conference Proceedings