Advancing Evidence-Based Medicine: A Large Language Model Approach to PICO Element Recognition and Extraction in Medical Literature
Date Submitted: Jan 11, 2026
Open Peer Review Period: Jan 12, 2026 - Mar 9, 2026
Objective: The exponential expansion of biomedical literature has created an urgent need for efficient methods to recognize and extract PICO (Population, Intervention, Comparison, Outcome) - the foundational elements of evidence-based medicine (EBM). This study systematically evaluates two complementary approaches for automating PICO recognition and extraction in medical literature: prompt engineering optimization and parameter-efficient Fine-Tuning of large language models (LLMs). Methods: We developed a dual-phase methodological framework: (1) systematic prompt optimization incorporating In-Context Learning (ICL), Chain-of-Thought (COT), and Tree-of-Thought (TOT) reasoning strategies; and (2) parameter-efficient fine-tuning (PEFT) of the LLM architecture using Low-Rank Adaptation (LoRA), Quantized LoRA (QLoRA), and Freeze techniques. PubMed-PICO and NICTA-PIBOSO benchmark datasets are used for recognition tasks while EBM-NLP is applied for extraction tasks. Performance metrics includes precision, recall, and F1-score . F1 is adopted as the major metric as it balances precision and recall. Results: COT prompting demonstrated superior recognition accuracy, achieving F1-scores of 77.1% (Population) and 84.5% (Outcome) on PubMed-PICO. In PEFT implementations, LoRA achieved peak classification performance (91.7% F1 for Population), while QLoRA showed best ex-traction capability (79.3% F1 for Intervention). Fine-tuned models established new benchmarks across all datasets, attaining SOTA results on NICTA-PIBOSO and EBM-NLP. PEFT demonstrated marked improvements over prompt engineering. Conclusion: Our findings indicate that large language models (LLMs) can effectively automate PICO recognition and extraction through two complementary approaches. First, prompt engineering allows the model to perform tasks directly without altering its internal settings. Second, the PEFT method further unlocks their maximum performance potential by incorporating additional fine-tuning based on prompt engineering. This work made significantly advances and provides critical insights for optimizing methodological approaches in clinical applications related to or comprised of PICO extraction and recognition tasks.
