Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study

doi:10.2196/48996

Published on 12.Jan.2024 in Vol 26 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/48996, first published 14.May.2023.

Close-up of hands typing on a laptop keyboard, woman wearing a ring

Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study

Eddie Guo¹

; Mehul Gupta¹

; Jiawen Deng²

; Ye-Jean Park²

; Michael Paget¹

; Christopher Naugler¹

Article Authors Cited by (154) Tweetations (58) Metrics Author Video

Journals

Kohandel Gargari O, Mahmoudi M, Hajisafarali M, Samiee R. Enhancing title and abstract screening for systematic reviews with GPT-3.5 turbo. BMJ Evidence-Based Medicine 2023:bmjebm-2023-112678 View
Ye A, Maiti A, Schmidt M, Pedersen S. A Hybrid Semi-Automated Workflow for Systematic and Literature Review Processes with Large Language Model Analysis. Future Internet 2024;16(5):167 View
Tran V, Gartlehner G, Yaacoub S, Boutron I, Schwingshackl L, Stadelmaier J, Sommer I, Alebouyeh F, Afach S, Meerpohl J, Ravaud P. Sensitivity and Specificity of Using GPT-3.5 Turbo Models for Title and Abstract Screening in Systematic Reviews and Meta-analyses. Annals of Internal Medicine 2024;177(6):791 View
Yoon D, Han C, Kim D, Kim S, Bae S, Ryu J, Choi Y. Redefining Health Care Data Interoperability: Empirical Exploration of Large Language Models in Information Exchange. Journal of Medical Internet Research 2024;26:e56614 View
Riaz I, Naqvi S, Hasan B, Murad M. Future of Evidence Synthesis: Automated, Living, and Interactive Systematic Reviews and Meta-analyses. Mayo Clinic Proceedings: Digital Health 2024;2(3):361 View
Landschaft A, Antweiler D, Mackay S, Kugler S, Rüping S, Wrobel S, Höres T, Allende-Cid H. Implementation and evaluation of an additional GPT-4-based reviewer in PRISMA-based medical systematic literature reviews. International Journal of Medical Informatics 2024;189:105531 View
Luo X, Chen F, Zhu D, Wang L, Wang Z, Liu H, Lyu M, Wang Y, Wang Q, Chen Y. Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses. Journal of Medical Internet Research 2024;26:e56780 View
Menold H, Wieland V, Haney C, Uysal D, Wessels F, Cacciamani G, Michel M, Seide S, Kowalewski K. Machine learning enables automated screening for systematic reviews and meta-analysis in urology. World Journal of Urology 2024;42(1) View
Akinseloyin O, Jiang X, Palade V. A question-answering framework for automated abstract screening using large language models. Journal of the American Medical Informatics Association 2024;31(9):1939 View
Guo E, Ramchandani R, Park Y, Gupta M. OSCEai: personalized interactive learning for undergraduate medical education. Canadian Medical Education Journal 2024 View
Park K, Choi H. How to Harness the Power of GPT for Scientific Research: A Comprehensive Review of Methodologies, Applications, and Ethical Considerations. Nuclear Medicine and Molecular Imaging 2024;58(6):323 View
Matsui K, Utsumi T, Aoki Y, Maruki T, Takeshima M, Takaesu Y. Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews. Journal of Medical Internet Research 2024;26:e52758 View
Li Y, Luan Z, Liu Y, Liu H, Qi J, Han D. Automated information extraction model enhancing traditional Chinese medicine RCT evidence extraction (Evi-BERT): algorithm development and validation. Frontiers in Artificial Intelligence 2024;7 View
Klang E, Alper L, Sorin V, Barash Y, Nadkarni G, Zimlichman E. Advancing radiology practice and research: harnessing the potential of large language models amidst imperfections. BJR|Open 2023;6(1) View
Arfaie S, Sadegh Mashayekhi M, Mofatteh M, Ma C, Ruan R, MacLean M, Far R, Saini J, Harmsen I, Duda T, Gomez A, Rebchuk A, Pingbei Wang A, Rasiah N, Guo E, Fazlollahi A, Rose Swan E, Amin P, Mohammed S, Atkinson J, Del Maestro R, Girgis F, Kumar A, Das S. ChatGPT and neurosurgical education: A crossroads of innovation and opportunity. Journal of Clinical Neuroscience 2024;129:110815 View
Reason T, Langham J, Gimblett A. Automated Mass Extraction of Over 680,000 PICOs from Clinical Study Abstracts Using Generative AI: A Proof-of-Concept Study. Pharmaceutical Medicine 2024;38(5):365 View
Lee J, Park S, Shin J, Cho B. Analyzing evaluation methods for large language models in the medical field: a scoping review. BMC Medical Informatics and Decision Making 2024;24(1) View
Sugiura A, Saegusa S, Jin Y, Yoshimoto R, Smith N, Dohi K, Higuchi T, Kozu T. Evaluation of RMES, an Automated Software Tool Utilizing AI, for Literature Screening with Reference to Published Systematic Reviews as Case-Studies: Development and Usability Study. JMIR Formative Research 2024;8:e55827 View
Bailey R, MacFarlane A, Field M, Tagkopoulos I, Baranzini S, Edwards K, Rose C, Schork N, Singhal A, Wallace B, Fisher K, Markakis K, Stover P, Bovell-Benjamin A. Artificial intelligence in food and nutrition evidence: The challenges and opportunities. PNAS Nexus 2024;3(12) View
Liu Z, Chai Y, Li J. Toward Automated Simulation Research Workflow through LLM Prompt Engineering Design. Journal of Chemical Information and Modeling 2025;65(1):114 View
Janumpally R, Nanua S, Ngo A, Youens K. Generative artificial intelligence in graduate medical education. Frontiers in Medicine 2025;11 View
Hua Y, Beam A, Chibnik L, Torous J. From statistics to deep learning: Using large language models in psychiatric research. International Journal of Methods in Psychiatric Research 2025;34(1) View
Schrager S, Seehusen D, Sexton S, Richardson C, Neher J, Pimlott N, Bowman M, Rodíguez J, Morley C, Li L, Dera J. Use of AI in family medicine publications: a joint editorial from journal editors. Evidence-Based Practice 2025;28(1):1 View
Chen H, Jiang Z, Liu X, Xue C, Yew S, Sheng B, Zheng Y, Wang X, Wu Y, Sivaprasad S, Wong T, Chaudhary V, Tham Y. Can large language models fully automate or partially assist paper selection in systematic reviews?. British Journal of Ophthalmology 2025;109(8):962 View
Fleurence R, Bian J, Wang X, Xu H, Dawoud D, Higashi M, Chhatwal J. Generative Artificial Intelligence for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations: An ISPOR Working Group Report. Value in Health 2025;28(2):175 View
Schrager S, Seehusen D, Sexton S, Richardson C, Neher J, Pimlott N, Bowman M, Rodríguez J, Morley C, Li L, DomDera J. Use of AI in family medicine publications: a joint editorial from journal editors. Family Medicine and Community Health 2025;13(1):e003238 View
Rafiq K, Beery S, Palmer M, Harchaoui Z, Abrahms B. Generative AI as a tool to accelerate the field of ecology. Nature Ecology & Evolution 2025;9(3):378 View
Purewal A, Fautsch K, Klasova J, Hussain N, D'Souza R. Human versus artificial intelligence: evaluating ChatGPT’s performance in conducting published systematic reviews with meta-analysis in chronic pain research. Regional Anesthesia & Pain Medicine 2026;51(4):437 View
Cao C, Sang J, Arora R, Chen D, Kloosterman R, Cecere M, Gorla J, Saleh R, Drennan I, Teja B, Fehlings M, Ronksley P, Leung A, Weisz D, Ware H, Whelan M, Emerson D, Arora R, Bobrovitz N. Development of Prompt Templates for Large Language Model–Driven Screening in Systematic Reviews. Annals of Internal Medicine 2025;178(3):389 View
Li Y, Datta S, Rastegar-Mojarad M, Lee K, Paek H, Glasgow J, Liston C, He L, Wang X, Xu Y. Enhancing systematic literature reviews with generative artificial intelligence: development, applications, and performance evaluation. Journal of the American Medical Informatics Association 2025;32(4):616 View
Çalışkan E. Exploring possibilities and limits of ChatGPT: Usage in building design studies. Turkish Journal of Engineering 2025;9(3):490 View
Colangelo M, Guizzardi S, Meleti M, Calciolari E, Galli C. How to Write Effective Prompts for Screening Biomedical Literature Using Large Language Models. BioMedInformatics 2025;5(1):15 View
IIZUMI T, ONO Y, TAKIMOTO T, Chaogejilatu . Crop phenology data extraction from research papers using a large language model. Journal of Agricultural Meteorology 2025;81(2):112 View
Sujau M, Wada M, Vallée E, Hillis N, Sušnjak T. Accelerating Disease Model Parameter Extraction: An LLM-Based Ranking Approach to Select Initial Studies for Literature Review Automation. Machine Learning and Knowledge Extraction 2025;7(2):28 View
Brodsky V, Ullah E, Bychkov A, Song A, Walk E, Louis P, Rasool G, Singh R, Mahmood F, Bui M, Parwani A. Generative Artificial Intelligence in Anatomic Pathology. Archives of Pathology & Laboratory Medicine 2025;149(4):298 View
Dietrich E. Artificial intelligence in key pricing, reimbursement, and market access (PRMA) processes: better, faster, cheaper—can you really pick two?. Journal of Medical Economics 2025;28(1):586 View
Nitturi V, Flores A, Bauer D. Using Natural Language Processing to Automate Screening of Abstracts for Neurosurgical Guideline Creation. Neurosurgery 2025;97(3):736 View
Boyle A, Huo B, Sylla P, Calabrese E, Kumar S, Slater B, Walsh D, Vosburg R. Large language model-generated clinical practice guideline for appendicitis. Surgical Endoscopy 2025;39(6):3539 View
Nykvist B, Macura B, Xylia M, Olsson E. Testing the utility of GPT for title and abstract screening in environmental systematic evidence synthesis. Environmental Evidence 2025;14(1) View
Clark J, Barton B, Albarqouni L, Byambasuren O, Jowsey T, Keogh J, Liang T, Moro C, O’Neill H, Jones M. Generative artificial intelligence use in evidence synthesis: A systematic review. Research Synthesis Methods 2025;16(4):601 View
Reason T, Klijn S, Rawlinson W, Benbow E, Langham J, Teitsson S, Johannesen K, Malcolm B. Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs. PharmacoEconomics - Open 2025;9(4):501 View
Ramchandani R, Guo E, Rakab E, Rathod J, Strain J, Klement W, Shorr R, Williams E, Jones D, Gilbert S. Validation of automated paper screening for esophagectomy systematic review using large language models. PeerJ Computer Science 2025;11:e2822 View
Scherbakov D, Hubig N, Jansari V, Bakumenko A, Lenert L. The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review. Journal of the American Medical Informatics Association 2025;32(6):1071 View
Trad F, Yammine R, Charafeddine J, Chakhtoura M, Rahme M, El-Hajj Fuleihan G, Chehab A. Streamlining systematic reviews with large language models using prompt engineering and retrieval augmented generation. BMC Medical Research Methodology 2025;25(1) View
Vallamchetla S, Abdelkader O, Elnaggar A, Ramadan D, Islam Shourav M, Riaz I, Lin M. Do it faster with PICOS: Generative AI-Assisted systematic review screening. Journal of Biomedical Informatics 2025;168:104860 View
Zhuang Z, Chen J, Xu H, Jiang Y, Lin J. Large language models for automated scholarly paper review: A survey. Information Fusion 2025;124:103332 View
Rokhshad R, Bagherianlemraski M, Ehsani S, Haghighat S, Schwendicke F. Large language models for the screening step in systematic reviews in dentistry. Journal of Dentistry 2025;160:105877 View
Yu Z, Fang L, Ding Y, Shen Y, Xu L, Cai Y, Liu X. Evaluating large language models for information extraction from gastroscopy and colonoscopy reports through multi-strategy prompting. Journal of Biomedical Informatics 2025;168:104844 View
Fleurence R, Wang X, Bian J, Higashi M, Ayer T, Xu H, Dawoud D, Chhatwal J. A Taxonomy of Generative Artificial Intelligence in Health Economics and Outcomes Research: An ISPOR Working Group Report. Value in Health 2025;28(11):1601 View
Oami T, Okada Y, Nakada T. Optimal large language models to screen citations for systematic reviews. Research Synthesis Methods 2025;16(6):859 View
Zhang Z, Momeni Nezhad M, Gupta P, Zolnour A, Azadmaleki H, Topaz M, Zolnoori M. Enhancing AI for citation screening in literature reviews: Improving accuracy with ensemble models. International Journal of Medical Informatics 2025;203:106035 View
de Bruin J, Lombaers P, Kaandorp C, Teijema J, van der Kuil T, Yazan B, Dong A, van de Schoot R. ASReview LAB v.2: Open-source text screening with multiple agents and a crowd of experts. Patterns 2025;6(7):101318 View
Lee S, Park J, Kang K, Han S, Kim S, Seo M. Development of a Deep Learning-Based Model to Predict KeratinoSensTM Activity for Screening Potential Skin Sensitizers. Journal of Environmental Health Sciences 2025;51(3):137 View
Homiar A, Thomas J, Ostinelli E, Kennett J, Friedrich C, Cuijpers P, Harrer M, Leucht S, Miguel C, Rodolico A, Kataoka Y, Takayama T, Yoshimura K, So R, Tsujimoto Y, Yamagishi Y, Takagi S, Sakata M, Bašić Đ, Karyotaki E, Potts J, Salanti G, Furukawa T, Cipriani A. Development and evaluation of prompts for a large language model to screen titles and abstracts in a living systematic review. BMJ Mental Health 2025;28(1):e301762 View
Bayani A, Epoh Ewane L, Oliveira dos Anjos D, Mac-Seing M, Nikiema J. Leveraging open-source large language models (LLMs) in scoping reviews: a case study on disability and AI applications. International Journal of Medical Informatics 2025;204:106048 View
Kwon C. Using artificial intelligence for the development of a living evidence map: The pharmacopuncture example. Integrative Medicine Research 2025;14(4):101217 View
Rüland A, Andersen L, Hassen A, Kinyanjui C, Ralfs A, Grisci B. Science diplomacy: A global research field? Findings from a bibliometric analysis of the science diplomacy scholarship of the past twenty years. Scientometrics 2025;130(8):4697 View
Xu S, Zhao Z, Liu X, Meng X. A comparative study of screening performance between abstrackr and GPT models: Systematic review and contextual analysis. BMC Medical Informatics and Decision Making 2025;25(1) View
Graumann O, Cui Xin W, Goudie A, Blaivas M, Braden B, Campbell Westerway S, Chammas M, Dong Y, Gilja O, Hsieh P, Jiang Tian A, Liang P, Möller K, Nolsøe C, Săftoiu A, Dietrich C. Artificial Intelligence in Abdominal, Gynecological, Obstetric, Musculoskeletal, Vascular and Interventional Ultrasound. Ultrasound in Medicine & Biology 2025;51(11):1865 View
Shanmugam D, Agrawal M, Movva R, Chen I, Ghassemi M, Jacobs M, Pierson E. Generative Artificial Intelligence in Medicine. Annual Review of Biomedical Data Science 2025;8(1):199 View
Pan C, Lu W, Chen B, Zhang G, Yang Z, Hao J. Automated Literature Screening for Hepatocellular Carcinoma Treatment Through Integration of 3 Large Language Models: Methodological Study. JMIR Medical Informatics 2025;13:e76252 View
Rashid M, Yi C, Sathapanasiri T, Udayachalerm S, Boonpattharatthiti K, Insuk S, Veettil S, Lai N, Chaiyakunapruk N, Dhippayom T, Rashid M, Cheng S, Ming Lai N, Lawin S, Limhensin P, Wechkunanukul K, Mayang N, Rattanachaisit N, Ye X. Role of Generative Artificial Intelligence in Assisting Systematic Review Process in Health Research: A Systematic Review. Value in Health 2025;28(11):1665 View
Fleurence R, Dawoud D, Bian J, Higashi M, Wang X, Xu H, Chhatwal J, Ayer T. ELEVATE-GenAI: Reporting Guidelines for the Use of Large Language Models in Health Economics and Outcomes Research: An ISPOR Working Group Report. Value in Health 2025;28(11):1611 View
Ruan M, Fan J, Liu M, Meng Z, Zhang X, Zhang C. Artificial intelligence for the science of evidence synthesis: how good are AI-powered tools for automatic literature screening?. BMC Medical Research Methodology 2025;25(1) View
Dogra S, Arabshahi S, Wei J, Hu E, Saidenberg L, Sharma S, Gu Z, Siriruchatanon M, Kang S. Evaluating Large Language Models for Radiology Systematic Review Title and Abstract Screening. Academic Radiology 2025;32(12):7023 View
Insuk S, Boonpattharatthiti K, Booncharoen C, Chaipitak P, Rashid M, Veettil S, Lai N, Chaiyakunapruk N, Dhippayom T. How Well Do ChatGPT and Claude Perform in Study Selection for Systematic Review in Obstetrics. Journal of Medical Systems 2025;49(1) View
Asadi A, Zhang Y, Said H. What do personas say about privacy & security: a systematic literature review through human-AI collaboration. Journal of Ambient Intelligence and Humanized Computing 2025;16(11-12):1175 View
Emilova Doneva S, de Viragh S, Hubarava H, Schandelmaier S, Briel M, Ineichen B. StudyTypeTeller—Large language models to automatically classify research study types for systematic reviews. Research Synthesis Methods 2025;16(6):1005 View
Salehin I, Tomal Ahmed Sajib M, Huda Badhon N, Sakibul Hassan Rifat M, Amin N, Nessa Moon N. Systematic Literature Review of LLM‐Large Language Model in Medical: Digital Health, Technology and Applications. Engineering Reports 2025;7(9) View
Buddenhagen C, Bourdôt G, Lamoureaux S, Noble A, Dawson M, Phillips C. Validating a rapid algorithmic weed hazard ranking method. Pest Management Science 2026;82(1):375 View
Drastig K, Singh R. Review of Water Use Assessment in Livestock Production Systems and Supply Chains. Water 2025;17(19):2819 View
Shen J, Luo Z, Jia D, Wang S, Sun F, Wu J. Large language model enhanced framework for systematic reviews and meta-analyses. BMJ Digital Health & AI 2025;1(1):e000017 View
Lee K, Paek H, Ofoegbu N, Rube S, Higashi M, Dawoud D, Xu H, Shi L, Wang X. A4SLR: An Agentic Artificial Intelligence-Assisted Systematic Literature Review Framework to Augment Evidence Synthesis for Health Economics and Outcomes Research and Health Technology Assessment. Value in Health 2025;28(11):1655 View
van de Schoot R, Messina Coimbra B, Evenhuis T, Lombaers P, Weijdema F, de Bruin L, Neeleman R, Grandfield E, Sijbrandij M, Teijema J, Jalsovec E, Bron M, Winter S, de Bruin J, van Zuiden M. The hunt for the last relevant paper: blending the best of humans and AI. European Journal of Psychotraumatology 2025;16(1) View
Emami M, Shirani M. Comparing the performance of ChatGPT, DeepSeek, and Gemini in systematic and umbrella review tasks over time. The Journal of the American Dental Association 2025;156(12):1014 View
Hu B, Tomini E, Corrin T, Pussegoda K, Sandner E, Henriques A, Simniceanu A, Fontana L, Wagner A, Brazeau S, Waddell L. Enhancing Evidence Synthesis Efficiency: Leveraging Large Language Models and Agentic Workflows for Optimized Literature Screening. Cochrane Evidence Synthesis and Methods 2025;3(6) View
Cheng P, Hu F, Chen L, Liu J, Wu J, Chen W. Generative artificial intelligence in ophthalmology research writing: A comprehensive review of applications, detection tools, and ethical considerations. Taiwan Journal of Ophthalmology 2026;16(1):68 View
Xu Z, Zhuang X, Ma S. Machine Learning-Assisted Systematic Review: A Case Study in Learning Analytics. Education Sciences 2025;15(11):1488 View
Kolagani N, Glynn P, Voinov A, Quinn N, Helgeson J, Dyckman C. Participatory modeling in the AI era. Environmental Modelling & Software 2026;196:106762 View
Tinajero C. The Pediatric Surgeon's AI Toolbox: How Large Language Models Like ChatGPT Are Simplifying Practice and Expanding Global Access. European Journal of Pediatric Surgery 2026;36(03):190 View
Sciurti A, Migliara G, Siena L, Isonne C, De Blasiis M, Sinopoli A, Iera J, Marzuillo C, De Vito C, Villari P, Baccolini V. Compact large language models for title and abstract screening in systematic reviews: An assessment of feasibility, accuracy, and workload reduction. Research Synthesis Methods 2026;17(2):332 View
Zhou Y, Hu H. Can AI assess literature like experts? An entropy-based comparison of ChatGPT-4o, DeepSeek R1, and human ratings. Frontiers in Research Metrics and Analytics 2025;10 View
Pratte M, Thirukumar S, Zhang C, Slessarev M, Basmaji J, Prager R. Can large language models approximate the results of meta-analyses in critical care? A meta-research study. Journal of Critical Care 2026;92:155358 View
Gartlehner G, Nussbaumer‐Streit B, Hamel C, Garritty C, Griebler U, King V, Devane D, Kamel C. Responsible Integration of Artificial Intelligence in Rapid Reviews: A Position Statement From the Cochrane Rapid Reviews Methods Group. Cochrane Evidence Synthesis and Methods 2025;3(6) View
Hodgkinson I, Barth M, Dornack C. Mapping Life Cycle Assessment Methods for Components of Carbon Fibre Metal Laminates: A Systematic and AI-Based Review of Aluminium, Carbon Fibre, and Epoxy Resin. Sustainability 2025;17(23):10445 View
Lai H, Ma N, Zhao W, Liu J, Pan B, Tian J, Chen Y, Ma B, Shang H, Liu J, Bian Z, Wu D, Sun X, Du L, Zhang J, Liu X, Zeng F, Sun F, Zhang B, Jin Y, Xia J, Shi N, Liu Q, Yang K, Ge L, Huang L. Digital intelligent evidence-based medicine: new paradigm for evidence-based research and practice in the AI era. Chinese Science Bulletin 2026;71(2):449 View
Cassell K, Ologunowa A, Rastegar-Mojarad M, Chun B, Huang Y, Wang D, Cossrow N. Analysis of article screening and data extraction performance by an AI systematic literature review platform. Frontiers in Artificial Intelligence 2025;8 View
Im E, Kim B, Kang S, Kim H. Rapid Clinical Evidence Explorer: A Generative Pre-Trained Transformer–Powered Tool for Automated Oncology Evidence Extraction. JCO Clinical Cancer Informatics 2025;(9) View
Peltonen L, Topaz M, Zhang Z. From Research to Practice in Days, not Decades: Why Leaders Must Act now. Journal of Medical Systems 2025;49(1) View
Hilkenmeier F, Pelzer M, Stierle C, Fink-Lamotte J. Evaluating the AI Tool “Elicit” as a Semi-Automated Second Reviewer for Data Extraction in Systematic Reviews: A Proof-of-Concept. Social Science Computer Review 2025 View
Shae W, Islam Saif M, Fife J, Mudaranthakam D, Pei D, Harlan-Williams L, Thompson J, Koestler D. Utilizing natural language processing to identify cancer-relevant publications at a National Cancer Institute-designated cancer center. JAMIA Open 2025;8(6) View
Kwon M, Jang K, Baek S, Han Y, Choi H, Lee I, Kim J. Ophtimus-V2-Tx: a compact domain-specific LLM for ophthalmic diagnosis and treatment planning. Scientific Reports 2025;15(1) View
Shen P, Yuan Y, He X, Wang F. Beyond technical efficacy: challenges and critical concerns of large language model’s impact on medical education in China: a systematic review. Global Medical Education 2025;2(1):113 View
Nordmann K, Fischer F. Harnessing ChatGPT for abstract screening in health-related scoping reviews: the role of structured eligibility criteria. BMC Health Services Research 2025;26(1) View
Sakai H, Lam S. Large Language Models for Health Care Text Classification: Systematic Review. JMIR AI 2026;5:e79202 View
Nordmann K, Schaller M, Sauter S, Fischer F. Capability of chatbots powered by large language models to support the screening process of scoping reviews: a feasibility study. JAMIA Open 2026;9(1) View
Valencia-Coronel B, Marescaux J, Gimenez M. Artificial intelligence-based tools for optimizing surgical research publications. Artificial Intelligence Surgery 2026;6(1):36 View
Sohoni N, Sohoni N, Sutherland R, Sundaresan V, Smani S, Ananth P, Onofrey J, Aneja S, Miszczyk M, Lee H, Olivieri J, Leapman M. Automated Classification of Adverse Events After Hydrogel Perirectal Spacer Insertion for Prostate Cancer Using Large Language Models. Urology 2026;209:18 View
Knapen D, van Kruchten M, de Groot D, Broekman K, Fehrmann R. Artificial intelligence for clinical trial design, conduct, and analysis: a narrative review. ESMO Real World Data and Digital Oncology 2026;11:100682 View
Oami T, Okada Y, Nakada T. Automated systematic reviews using machine learning and large language models in clinical practice guideline development: A perspective. Hong Kong Journal of Emergency Medicine 2026;33(1) View
Xu Z, Ma S, Zhuang X, Adeyemi A, Kogut A. Machine learning-assisted abstract screening on learning analytics: a step-by-step tutorial. Systematic Reviews 2026;15(1) View
Akinseloyin O, Jiang X, Palade V. Large language model-based multiagent collaboration for abstract screening toward automated systematic reviews. Biology Methods and Protocols 2026;11(1) View
Li H, Tang W, Leung H, Gao Z, Chan K, Lou V, Heyn P. Tools for dementia disclosure: a systematic review of family caregivers’ perspectives and experiences. The Gerontologist 2026;66(4) View
Davis R, List S, Chappell K, Madar A, Henjum S, Heen E. Evaluating the performance of a custom GPT in full text screening of a systematic review. Scandinavian Journal of Public Health 2026 View
Li Y, Plasek J, Du X, Wang Y, Zhou Z, Lian J, Chuang Y, Hong P, Hou P, Zhou L. Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization. Journal of the American Medical Informatics Association 2026;33(5):1026 View
Wang J, Ling H, Zhang L, Zhang L, Wang F, Gao Y, Li Z. CKD-EHR: Clinical Knowledge Distillation for Electronic Health Records. Biomedical Signal Processing and Control 2026;120:110090 View
Gong E, Bang C, Shin Y. Applications of Large Language Models in Medical Research: From Systematic Reviews to Clinical Studies. Bioengineering 2026;13(3):365 View
Luo J, Yu J, Cai Z, Xu L, Lv Y, Zheng Q, Li L. Innovative applications of artificial intelligence technology in pharmacometrics. European Journal of Clinical Pharmacology 2026;82(4) View
Leon C, Kudelka M. Auditing GenAI Literature Search Workflows: A Replicable Protocol for Traceable, Accountable Retrieval in Student-Facing Inquiry. AI in Education 2026;2(2):8 View
Bentegeac R, Le Guellec B, Leblanc V, Lenain R, Dauchet L, Gauthier V, Gerard E, Chazard E, Amouyel P, Aymes E, Hamroun A. BibliZap: An exploratory evaluation of an automated multi-level citation searching tool for systematic and rapid reviews. Research Synthesis Methods 2026;17(4):816 View
Chenggong X, Keying H, Zhengquan D, Xinyi H, Bin W. Interdisciplinary integration and development trends of intelligent diagnosis in traditional Chinese medicine: a topic evolution analysis. Digital Chinese Medicine 2026;9(1):43 View
Zhou F, Afzal M, Saha A, Parrish R, Haynes R, Iorio A, Lokker C. Zero-shot interpretable biomedical literature appraisal with generative large language models. JAMIA Open 2026;9(2) View
Mitrov G, Stanoev B, Trajkovik V, Risteska Stojkoska B, Basnarkov L, Lameski P, Kampel M, Zdravevski E. Optimizing document retrieval using massive text embeddings and LLM prompt engineering. Systematic Reviews 2026;15(1) View
Vivekanantha P, Kahlon H, Balogun O, Bouchard M, de Sa D, Ayeni O, Kay J. Automated data extraction for systematic reviews using GPT‐5.2 and Google Gemini Pro 3: A dual‐large language model approach in orthopaedic research. Knee Surgery, Sports Traumatology, Arthroscopy 2026 View
Song Z, Huang S, Thapa N, Zhang X, Park B, Lu J, Li W, Liu W, Zhan B, Li J. Large language model-based paper classification framework with key-insight extraction and confidence-weighted voting. Research Synthesis Methods 2026:1 View
Rodríguez S, León-Prieto C, Rodríguez-Jaime M. Artificial intelligence versus human consensus: A concordance analysis in the screening of studies for evidence synthesis in physical activity and sport. Journal of Bodywork and Movement Therapies 2026;47:371 View
Radeva I, Noncheva T, Doukovska L, Popchev I. Comparing Single-Agent and Multi-Agent Strategies in LLM-Based Title-Abstract Screening. Electronics 2026;15(8):1661 View
Tolend M, Halabi R, Ghaouari K, Lau Y, Alda M, Hintze A, Mulsant B, Ortiz A. Novel Abstract Screening Algorithm Using Delphi-Inspired Large Language Model Consensus for Systematic Reviews in Psychiatry: Nouvel algorithme de sélection des résumés utilisant un consensus issu d’un grand modèle de langage inspiré de la méthode Delphi pour les revues systématiques en psychiatrie. The Canadian Journal of Psychiatry 2026 View
Buitrago-Garcia D, Courvoisier D, Capderou S, Iudici M, Mongin D. Performance of Zero-Shot Classifiers for Categorizing RCT Abstracts by Intervention Type: Validation Study. JMIR Medical Informatics 2026;14:e77943 View
Rauch A, Frese M, Tabilo M. Meta-analyses in entrepreneurship research: the state of the art and future directions for validity assessment. Foundations and Trends® in Entrepreneurship 2026;22(1):1 View
Yao Y, Liu H, Yang D, Luo X, Lai H, Wang Z, Chen Y, Bian Z. Integration of large language models and evidence-based Chinese medicine: A scoping review. Integrative Medicine Research 2026;15(3):101349 View
Keenan P, Heavin C. Beyond manual review: using LLMs to systematically map five decades of IFIP WG8.3 decision-support research. Journal of Decision Systems 2026;35(1) View
Hu H, Qu Z, Liu Y, Zhu L, Mei Z, Chen X. AI-driven fungicide design: From target identification to field application. Plant Communications 2026;7(5):101850 View
Carrier R, Lopez L, Moya A, Hernandez V, D’Apuzzo M, McNamara C. Evaluating the Utility of Artificial Intelligence in Conducting Systematic Reviews. Arthroplasty Today 2026;39:102035 View
Ye Y, Colombo M, Meessen J, Hajizadeh N, De Vore L, Sigarchian H, Peters C, Krovvidi S, Meresh S, Hasson L, Jakka S, Tyl B, Barandiaran A, Folin B, Novelli D, Mueller-Wieland D, Buldurluoglu E, Pardali K, Bassi M, Cinquini M, Lavazza C, Pavo N, Balfanz P, Latini R, Klas S, Balosso S. Automated full-text screening and accelerated reviews using large language models with context-aware agents: an exploratory analysis in biomarker research. European Heart Journal - Digital Health 2026;7(5) View
Kwok W, Wallbank G, Hodgson P, Schrader T, Shao L, Elkins M, Fandim J, Scott J, Sherrington C, Traeger A. Automated approaches to identifying clinical trials based on title and abstract in the field of physiotherapy: a comparative analysis. Journal of Clinical Epidemiology 2026;196:112309 View
Pescol F, Buonocore T, Tibollo V, Failla G, Traversi E, La Rovere M, Sacchi L, Ricciardi W, Bellazzi R. Evaluating large language models for structuring cardiology reports: a real-world clinical study on patient subtyping and trial recruitment. International Journal of Medical Informatics 2026;217:106509 View
Meneses‐Echavez J, Alonso‐Coello P, Vist G, Bracchiglione J, Song Y, Flottorp S, Rosenbaum S. Evidence‐to‐Decision Frameworks: Enhancing the Quality and Rigour of Guidelines and Recommendations. Clinical and Public Health Guidelines 2026;3(3) View
Madeyski L, Kitchenham B, Shepperd M. LLM4SCREENLIT: Recommendations on assessing the performance of large language models for screening literature in systematic reviews. Information and Software Technology 2026;198:108204 View
Aigner M, Ganzinger M, Probst P, Rinckens M, Pausch T. Comparing supervised machine learning and large language models in title-abstract screening. Systematic Reviews 2026;15(1) View
Birene B, Popoff B, Graesslin O, Vuiblet V, Morel O. De PubMed à GPT : une revue narrative du rôle émergent des grands modèles de langage pour les revues systématiques. Gynécologie Obstétrique Fertilité & Sénologie 2026 View
Wang D, Datta S, Glasgow J, Lee K, Paek H, Zhang J, Zheng Y, Huang Y, He L, Rastegar-Mojarad M, Cassell K, Wang X, Cossrow N. AI-Assisted Systematic Literature Review of the Economic Burden of Pneumococcal Disease: Development and Validation Study. JMIR AI 2026;5:e81049 View
Hilkenmeier F, Stoltenberg M, Stierle C. Using full agreement across multiple large language models for title-and-abstract screening in systematic reviews: a proof-of-concept. Systematic Reviews 2026;15(1) View
Cifuentes-González C, Singer M, Rojas-Carabali W, Mejía-Salgado G, Cicinelli M, Biswas J, Gangaputra S, de-la-Torre A, Gupta V, Pulido J, Agrawal R. Transforming Systematic Reviews: Evaluating a Fine-Tuned Large Language Model for Abstract Screening in Uveitis and Retinal Vasculitis. Ophthalmology Science 2026:101296 View
Józwiak Á, Imre A, Hagymásy J, Tittmann J, Nagy Á, Kovács S, Kardas P, van Boven J, Mommers I, Ágh T. REFLECTIVE-TIAB: cost-effective prompt optimization for large language model-based title and abstract screening in literature reviews. Expert Review of Pharmacoeconomics & Outcomes Research 2026;26(6):747 View
Liu Y, Yang R, Liew J, Yin Z, Foote H, Lindsell C, Hong C. Leveraging LLMs for Title and Abstract Screening for Systematic Review: A Cost-effective Dynamic Few-shot Learning Approach. Journal of Healthcare Informatics Research 2026 View
Knauer M, Greß J, Kather J, May P. Augmenting Oncology Guideline Maintenance with Large Language Models: A Prospective Case Study (Preprint). JMIR AI 2026 View
Gartlehner G, Banda S, Callaghan M, Chase J, Dobrescu A, Eisele-Metzger A, Flemyng E, Gardner S, Griebler U, Helfer B, Jemiolo P, Macura B, Minx J, Noel-Storr A, Tahmasebi N, Sharifan A, Meerpohl J, Thomas J. Cochrane evaluation of (semi-)automated review methods: protocol for an adaptive platform study within reviews. Journal of Clinical Epidemiology 2026;198:112390 View
Jang W, Sultana S, Yao Z, Tran H, Yang Z, Kwon S, Yu H. Enhancing Large Language Models for Identifying and Prioritizing Important Medical Jargons From Electronic Health Record Notes Using Data Augmentation: Comparative Study. JMIR AI 2026;5:e75561 View

Books/Policy Documents

Keenan P, Heavin C. Decision Support Systems XVI. Decision Support System Technology for Sustainable and Resilient Transitions. View
Melo P, Guadagno E, Poenaru D. Artificial Intelligence in Medicine. View

Conference Proceedings

Huotala A, Kuutila M, Ralph P, Mäntylä M. Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews View
Felizardo K, Lima M, Deizepe A, Conte T, Steinmacher I. Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ChatGPT application in Systematic Literature Reviews in Software Engineering: an evaluation of its accuracy to support the selection activity View
Sandner E, Hu B, Simiceanu A, Fontana L, Jakovljevic I, Henriques A, Wagner A, Gütl C. 2024 2nd International Conference on Foundation and Large Language Models (FLLM). Screening Automation for Systematic Reviews: A 5-Tier Prompting Approach Meeting Cochrane’s Sensitivity Requirement View
Rahman M, Al-Hazzaa S. GLOBECOM 2024 - 2024 IEEE Global Communications Conference. Next-Generation Virtual Hospital: Integrating Discriminative and Large Multi-Modal Generative AI for Personalized Healthcare View
Ogdu C, Gurbuz S, Karakose M, Hanoglu E. 2025 29th International Conference on Information Technology (IT). Medical Implications of LLM Based Clinical Decision Support Systems in Healthcare View
Mao X, Leelanupab T, Potthast M, Scells H, Zuccon G. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs View
Soprano M, Modha S, Roitero K, Maddalena E, Viviani M, Pasi G, Mizzaro S. Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR). AIDME: A Scalable, Interpretable Framework for AI-Aided Scoping Reviews View
Köhler J, Harl M, Westner M, Strahringer S. 2025 27th International Conference on Business Informatics (CBI). Can AI be a Scholar? A Systematic Review on the Role of Generative AI in Systematic Literature Reviews View
Tetz L, Capitaine L, Kobayashi R, Jagusch B, Nguyen T, Lux T. 2025 IEEE/ACS 22nd International Conference on Computer Systems and Applications (AICCSA). Integrating Gender-Sensitive Data into Clinical AI Systems: A Proof of Concept for Inclusive Healthcare View
Huotala A, Kuutila M, Mäntylä M. 2025 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews View
Sandner E, Negovetić M, Kothari K, Taj I, Fontana L, Henriques A, Jakovljević I, Simniceanu A, Wagner A, Gütl C. 2025 3rd International Conference on Foundation and Large Language Models (FLLM). Cal-X: Enhancing Systematic Review Screening with LLMs and Next-Token Likelihood Calibration View
Vaddi S, M.Sparshitha , Sreeja M, Tejaswini S. 2026 IEEE International Conference on Emerging Computing and Intelligent Technologies (ICoECIT). Smart Complaint-Driven Root Cause Detection and Alert System for Supply Chain View
Verduin M, Lodi V, Reis J. Anais do XXVI Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS 2026). Abordagens de Aprendizado de Máquina para Automatização de Etapas do Processo de Meta-Análise no Contexto da Saúde View

Citation

Please cite as:

Guo E, Gupta M, Deng J, Park YJ, Paget M, Naugler C
Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study
J Med Internet Res 2024;26:e48996
doi: 10.2196/48996 PMID: 38214966 PMCID: 10818236

Export Metadata

END for: Endnote

BibTeX for: BibDesk, LaTeX

RIS for: RefMan, Procite, Endnote, RefWorks

Add this article to your Mendeley library

This paper is in the following e-collection/theme issue:

Generative Language Models Including ChatGPT (1445) Natural Language Processing (1248) Chatbots and Conversational Agents (1147) Artificial Intelligence (4607)

Download

Download PDF Download XML

Share Article

Share on Bluesky Share on Twitter Share on Facebook Share on LinkedIn