<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "journalpublishing.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" article-type="letter"><front><journal-meta><journal-id journal-id-type="nlm-ta">J Med Internet Res</journal-id><journal-id journal-id-type="publisher-id">jmir</journal-id><journal-id journal-id-type="index">1</journal-id><journal-title>Journal of Medical Internet Research</journal-title><abbrev-journal-title>J Med Internet Res</abbrev-journal-title><issn pub-type="epub">1438-8871</issn><publisher><publisher-name>JMIR Publications</publisher-name><publisher-loc>Toronto, Canada</publisher-loc></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">v27i1e81769</article-id><article-id pub-id-type="doi">10.2196/81769</article-id><article-categories><subj-group subj-group-type="heading"><subject>Letter to the Editor</subject></subj-group></article-categories><title-group><article-title>Critical Limitations in Systematic Reviews of Large Language Models in Health Care</article-title></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name name-style="western"><surname>Weizman</surname><given-names>Zvi</given-names></name><degrees>MD, Prof Dr Med</degrees><xref ref-type="aff" rid="aff1"/></contrib></contrib-group><aff id="aff1"><institution>Faculty of Health Sciences, Ben-Gurion University</institution><addr-line>8 Balfour Street</addr-line><addr-line>Tel-Aviv</addr-line><country>Israel</country></aff><contrib-group><contrib contrib-type="editor"><name name-style="western"><surname>Leung</surname><given-names>Tiffany</given-names></name></contrib></contrib-group><author-notes><corresp>Correspondence to Zvi Weizman, MD, Prof Dr Med, Faculty of Health Sciences, Ben-Gurion University, 8 Balfour Street, Tel-Aviv, 6521120, Israel, 972 544888686; <email>wzvi@bgu.ac.il</email></corresp></author-notes><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>24</day><month>9</month><year>2025</year></pub-date><volume>27</volume><elocation-id>e81769</elocation-id><history><date date-type="received"><day>03</day><month>08</month><year>2025</year></date><date date-type="rev-recd"><day>05</day><month>08</month><year>2025</year></date><date date-type="accepted"><day>29</day><month>08</month><year>2025</year></date></history><copyright-statement>&#x00A9; Zvi Weizman. Originally published in the Journal of Medical Internet Research (<ext-link ext-link-type="uri" xlink:href="https://www.jmir.org">https://www.jmir.org</ext-link>), 24.9.2025. </copyright-statement><copyright-year>2025</copyright-year><license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on <ext-link ext-link-type="uri" xlink:href="https://www.jmir.org/">https://www.jmir.org/</ext-link>, as well as this copyright and license information must be included.</p></license><self-uri xlink:type="simple" xlink:href="https://www.jmir.org/2025/1/e81769"/><related-article related-article-type="commentary" id="e82729" ext-link-type="doi" xlink:href="10.2169/82729" xlink:title="Comment in" page="e82729" xlink:type="simple">https://www.jmir.org/2025/1/e82729</related-article><related-article related-article-type="commentary article" id="v27" ext-link-type="doi" xlink:href="10.2196/71916" xlink:title="Comment on" vol="27" xlink:type="simple">https://www.jmir.org/2025/1/e71916</related-article><kwd-group><kwd>letter</kwd><kwd>large language models</kwd><kwd>AI</kwd><kwd>health care</kwd><kwd>review</kwd><kwd>LLM</kwd><kwd>clinical</kwd><kwd>artificial intelligence</kwd><kwd>digital health</kwd></kwd-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>I read with interest the study by Li et al [<xref ref-type="bibr" rid="ref1">1</xref>] on the implementation of large language models (LLMs) in health care, which provides clinicians with guidance for selecting appropriate models for specific tasks. Although it provides a comprehensive overview, several limitations undermine its utility for clinical decision-making.</p></sec><sec id="s2"><title>Citation Threshold Bias</title><p>The authors exclude journals below a citation threshold of 13,000, which introduces a publication bias. It excludes innovative research from emerging or specialized journals, as documented in the methodology literature. This is problematic in a rapidly evolving field where important innovations may first appear in newer venues. While the authors note that only 8.9% (24/270) of studies reported negative results, which could affect the overall perception of their clinical effectiveness, they do not adequately account for this publication bias.</p></sec><sec id="s3"><title>Flawed Performance Definition</title><p>The definition of &#x201C;best performance&#x201D; is problematic. They acknowledge that performance level in one context does not guarantee similar performance in different contexts, and therefore, they state that the frequency of &#x201C;best performance&#x201D; should not be interpreted as a metric for comparing models. This acknowledgment undermines their quantitative analysis. The heterogeneity in evaluation metrics, datasets, and contexts across studies renders their performance comparisons essentially meaningless, a problem well-documented in AI literature [<xref ref-type="bibr" rid="ref2">2</xref>].</p></sec><sec id="s4"><title>Limited Quality Assessment</title><p>The review lacks assessment of the included studies. A recent meta-analysis in medical AI has emphasized the importance of evaluating study design, validation approaches, and statistical rigor [<xref ref-type="bibr" rid="ref3">3</xref>]. The authors&#x2019; approach of simply counting &#x201C;best performance&#x201D; instances without considering study quality, sample sizes, or validation rigor represents a significant methodological weakness.</p></sec><sec id="s5"><title>Conceptual and Analytical Limitations</title><p>The 5-stage linear workflow model, while organizationally useful, oversimplifies the complex and iterative nature of clinical decision-making. Modern health care delivery involves parallel processes, feedback loops, and multidisciplinary coordination that this model fails to capture, thereby limiting the practical utility of its recommendations [<xref ref-type="bibr" rid="ref4">4</xref>].</p></sec><sec id="s6"><title>Insufficient Discussion of Clinical Validation</title><p>They inadequately address the critical gap between research performance and clinical validation. As noted in recent systematic reviews of AI in health care, models trained and validated on research datasets face substantial deployment challenges in medical institutions due to significant differences between laboratory and clinical settings. While the authors mention this limitation, they do not adequately weigh it in their analysis.</p></sec><sec id="s7"><title>Limited Safety and Risk Analysis</title><p>Although the authors discuss ethical concerns, their analysis of patient safety remains superficial. Recent literature emphasizes the critical importance of comprehensive risk assessment in implementing medical AI, including analysis of failure modes, error propagation, and impacts on clinical decision-making [<xref ref-type="bibr" rid="ref5">5</xref>].</p></sec><sec id="s8"><title>Absence of Economic Evaluation</title><p>The review lacks a comprehensive economic evaluation of LLM implementation, including cost-effectiveness analyses, resource allocation considerations, and return-on-investment assessments. These limitations significantly impact the review&#x2019;s clinical applicability and highlight the need for more rigorous methodological approaches in evaluating AI in health care.</p></sec></body><back><fn-group><fn fn-type="conflict"><p>None declared.</p></fn></fn-group><glossary><title>Abbreviations</title><def-list><def-item><term id="abb1">LLM</term><def><p>large language model</p></def></def-item></def-list></glossary><ref-list><title>References</title><ref id="ref1"><label>1</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Li</surname><given-names>H</given-names> </name><name name-style="western"><surname>Fu</surname><given-names>JF</given-names> </name><name name-style="western"><surname>Python</surname><given-names>A</given-names> </name></person-group><article-title>Implementing large language models in health care: clinician-focused review with interactive guideline</article-title><source>J Med Internet Res</source><year>2025</year><month>07</month><day>11</day><volume>27</volume><fpage>e71916</fpage><pub-id pub-id-type="doi">10.2196/71916</pub-id><pub-id pub-id-type="medline">40644686</pub-id></nlm-citation></ref><ref id="ref2"><label>2</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Chang</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Yin</surname><given-names>JM</given-names> </name><name name-style="western"><surname>Li</surname><given-names>JM</given-names> </name><name name-style="western"><surname>Liu</surname><given-names>C</given-names> </name><name name-style="western"><surname>Cao</surname><given-names>LY</given-names> </name><name name-style="western"><surname>Lin</surname><given-names>SY</given-names> </name></person-group><article-title>Applications and future prospects of medical LLMs: a survey based on the M-KAT conceptual framework</article-title><source>J Med Syst</source><year>2024</year><month>12</month><day>27</day><volume>48</volume><issue>1</issue><fpage>112</fpage><pub-id pub-id-type="doi">10.1007/s10916-024-02132-5</pub-id><pub-id pub-id-type="medline">39725770</pub-id></nlm-citation></ref><ref id="ref3"><label>3</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Liu</surname><given-names>X</given-names> </name><name name-style="western"><surname>Cruz Rivera</surname><given-names>S</given-names> </name><name name-style="western"><surname>Moher</surname><given-names>D</given-names> </name><etal/></person-group><article-title>Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension</article-title><source>Nat Med</source><year>2020</year><month>09</month><volume>26</volume><issue>9</issue><fpage>1364</fpage><lpage>1374</lpage><pub-id pub-id-type="doi">10.1038/s41591-020-1034-x</pub-id></nlm-citation></ref><ref id="ref4"><label>4</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sittig</surname><given-names>DF</given-names> </name><name name-style="western"><surname>Singh</surname><given-names>H</given-names> </name></person-group><article-title>A new sociotechnical model for studying health information technology in complex adaptive healthcare systems</article-title><source>Qual Saf Health Care</source><year>2010</year><month>10</month><volume>19 Suppl 3</volume><issue>Suppl 3</issue><fpage>i68</fpage><lpage>74</lpage><pub-id pub-id-type="doi">10.1136/qshc.2010.042085</pub-id><pub-id pub-id-type="medline">20959322</pub-id></nlm-citation></ref><ref id="ref5"><label>5</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Sendak</surname><given-names>MP</given-names> </name><name name-style="western"><surname>Ratliff</surname><given-names>W</given-names> </name><name name-style="western"><surname>Sarro</surname><given-names>D</given-names> </name><etal/></person-group><article-title>Real-World Integration of a sepsis deep learning technology into routine clinical care: implementation study</article-title><source>JMIR Med Inform</source><year>2020</year><month>07</month><day>15</day><volume>8</volume><issue>7</issue><fpage>e15182</fpage><pub-id pub-id-type="doi">10.2196/15182</pub-id><pub-id pub-id-type="medline">32673244</pub-id></nlm-citation></ref></ref-list></back></article>