<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "http://dtd.nlm.nih.gov/publishing/2.0/journalpublishing.dtd">
<article article-type="letter" dtd-version="2.0" xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JMIR</journal-id>
      <journal-id journal-id-type="nlm-ta">J Med Internet Res</journal-id>
      <journal-title>Journal of Medical Internet Research</journal-title>
      <issn pub-type="epub">1438-8871</issn>
      <publisher>
        <publisher-name>JMIR Publications</publisher-name>
        <publisher-loc>Toronto, Canada</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">v26i1e56997</article-id>
      <article-id pub-id-type="pmid">38625725</article-id>
      <article-id pub-id-type="doi">10.2196/56997</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Letter to the Editor</subject>
        </subj-group>
        <subj-group subj-group-type="article-type">
          <subject>Letter to the Editor</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Evaluating GPT-4’s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="editor">
          <name>
            <surname>Leung</surname>
            <given-names>Tiffany</given-names>
          </name>
        </contrib>
      </contrib-group>
      <contrib-group>
        <contrib id="contrib1" contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Huang</surname>
            <given-names>Kuan-Ju</given-names>
          </name>
          <degrees>MS</degrees>
          <xref rid="aff1" ref-type="aff">1</xref>
          <address>
            <institution>Department of Obstetrics and Gynecology</institution>
            <institution>National Taiwan University Hospital Yunlin Branch</institution>
            <addr-line>No 579, Sec 2, Yunlin Rd, Douliu City</addr-line>
            <addr-line>Yunlin County, 640</addr-line>
            <country>Taiwan</country>
            <fax>886 55335325</fax>
            <phone>886 55323911 ext 563413</phone>
            <email>restroomer@icloud.com</email>
          </address>
          <ext-link ext-link-type="orcid">https://orcid.org/0000-0001-9502-7000</ext-link>
        </contrib>
      </contrib-group>
      <aff id="aff1">
        <label>1</label>
        <institution>Department of Obstetrics and Gynecology</institution>
        <institution>National Taiwan University Hospital Yunlin Branch</institution>
        <addr-line>Yunlin County</addr-line>
        <country>Taiwan</country>
      </aff>
      <author-notes>
        <corresp>Corresponding Author: Kuan-Ju Huang <email>restroomer@icloud.com</email></corresp>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2024</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>16</day>
        <month>4</month>
        <year>2024</year>
      </pub-date>
      <volume>26</volume>
      <elocation-id>e56997</elocation-id>
      <history>
        <date date-type="received">
          <day>1</day>
          <month>2</month>
          <year>2024</year>
        </date>
        <date date-type="accepted">
          <day>4</day>
          <month>4</month>
          <year>2024</year>
        </date>
      </history>
      <copyright-statement>©Kuan-Ju Huang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 16.04.2024.</copyright-statement>
      <copyright-year>2024</copyright-year>
      <license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
        <p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.</p>
      </license>
      <self-uri xlink:href="https://www.jmir.org/2024/1/e56997" xlink:type="simple"/>
      <related-article related-article-type="commentary-article" id="v26i1e52113" ext-link-type="doi" xlink:href="10.2196/52113" vol="26" page="e52113" xlink:type="simple">http://www.jmir.org/2024/1/e52113/</related-article>
      <related-article related-article-type="commentary" id="v26i1e57778" ext-link-type="doi" xlink:href="10.2196/57778" vol="26" page="e57778" xlink:type="simple">http://www.jmir.org/2024/1/e57778/</related-article>
      <kwd-group>
        <kwd>artificial intelligence</kwd>
        <kwd>ChatGPT</kwd>
        <kwd>Bloom taxonomy</kwd>
        <kwd>AI</kwd>
        <kwd>cognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <p>We are inspired by Herrmann-Werner et al’s [<xref ref-type="bibr" rid="ref1">1</xref>] article, which assesses GPT-4’s cognitive functions based on the Bloom taxonomy. Adopting the Bloom taxonomy for evaluating GPT-4’s understanding of specific knowledge, traditionally applied to humans, is a novel concept. The results could also offer insights into whether GPT-4 can think like a human. However, some points in this article need clarification.</p>
    <p>First, in Figure 3, the difficulty of the questions might have been inversely reported in the abstract, with 0 representing a very difficult question and 1 representing a very easy question, according to the description in the Quantitative Data Analysis subsection of the Methods. Consequently, GPT-4 performed better on easy questions than on hard ones.</p>
    <p>Second, since a large language model (LLM) like GPT-4 operates by predicting the next word from its memory-based archive [<xref ref-type="bibr" rid="ref2">2</xref>], it seems unlikely that GPT-4 would perform worst in the “remember” domain of the Bloom taxonomy in this study (42.65%) and excel in higher cognitive domains such as analyze, evaluate, and create, with incorrect reasoning accounting for 0%, 0.15%, and 0%, respectively, as reported in Table 3 [<xref ref-type="bibr" rid="ref1">1</xref>]. The Bloom taxonomy categorizes the aims of questions, not the answers, in evaluating a “student’s” cognitive level within specific domains. Therefore, evaluating GPT-4’s cognitive functions by analyzing its responses presupposes that GPT-4 can think like a human. However, given our current understanding of how LLMs generate answers—essentially predicting the next word based on probabilities within a database—it is doubtful that GPT-4’s cognitive levels in responses can be accurately assessed using the Bloom taxonomy, especially with high scores in advanced cognitive domains [<xref ref-type="bibr" rid="ref2">2</xref>].</p>
    <p>For example, when evaluating “memory” (eg, definitions, guidelines, or facts), if the combination of elements exists in its database, GPT-4 can readily produce the most likely answers from its “memory.” Conversely, when elements are incorrectly combined, it may produce “hallucinated” answers [<xref ref-type="bibr" rid="ref2">2</xref>]. In complex questions that test higher cognitive domains (eg, analyzing a previously unpublished case report with findings from subjective and objective medical evaluations to deduce the most likely diagnosis), if a similar case or key elements exist in GPT-4’s database, it might still produce a result from its “memory,” seemingly “analyzing, evaluating, and creating” an answer as it has “learned” from human problem-solving in similar cases. This “memory” function, considered LLM’s most potent capability compared to humans, can yield incorrect answers if the “memory” does not exist in the database (eg, news) or is not predicted as the next word. The apparent high cognitive function might result from the model’s ability to extract multiple human thought processes about a specific question from its vast database, akin to a well-trained system mimicking human cognitive processes [<xref ref-type="bibr" rid="ref3">3</xref>,<xref ref-type="bibr" rid="ref4">4</xref>].</p>
    <p>Since most medical qualifying exams consist mainly of “memory” tests, the actual count of incorrect reasoning in the “memory” domain could be lower when both correct and incorrect answers are combined. Until more evidence proving that LLMs can think like humans is available, evaluating LLM-generated answers through the Bloom taxonomy may yield misleading results.</p>
  </body>
  <back>
    <app-group/>
    <glossary>
      <title>Abbreviations</title>
      <def-list>
        <def-item>
          <term id="abb1">LLM</term>
          <def>
            <p>large language model</p>
          </def>
        </def-item>
      </def-list>
    </glossary>
    <fn-group>
      <fn fn-type="conflict">
        <p>None declared.</p>
      </fn>
    </fn-group>
    <ref-list>
      <ref id="ref1">
        <label>1</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Herrmann-Werner</surname>
              <given-names>A</given-names>
            </name>
            <name name-style="western">
              <surname>Festl-Wietek</surname>
              <given-names>T</given-names>
            </name>
            <name name-style="western">
              <surname>Holderried</surname>
              <given-names>F</given-names>
            </name>
            <name name-style="western">
              <surname>Herschbach</surname>
              <given-names>L</given-names>
            </name>
            <name name-style="western">
              <surname>Griewatz</surname>
              <given-names>J</given-names>
            </name>
            <name name-style="western">
              <surname>Masters</surname>
              <given-names>K</given-names>
            </name>
            <name name-style="western">
              <surname>Zipfel</surname>
              <given-names>S</given-names>
            </name>
            <name name-style="western">
              <surname>Mahling</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>Assessing ChatGPT's mastery of Bloom's taxonomy using psychosomatic medicine exam questions: mixed-methods study</article-title>
          <source>J Med Internet Res</source>
          <year>2024</year>
          <month>01</month>
          <day>23</day>
          <volume>26</volume>
          <fpage>e52113</fpage>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://www.jmir.org/2024/1/e52113/"/>
          </comment>
          <pub-id pub-id-type="doi">10.2196/52113</pub-id>
          <pub-id pub-id-type="medline">38261378</pub-id>
          <pub-id pub-id-type="pii">v26i1e52113</pub-id>
          <pub-id pub-id-type="pmcid">PMC10848129</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref2">
        <label>2</label>
        <nlm-citation citation-type="web">
          <article-title>GPT-4 system card</article-title>
          <source>OpenAI</source>
          <year>2023</year>
          <month>03</month>
          <day>23</day>
          <access-date>2024-04-09</access-date>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://cdn.openai.com/papers/gpt-4-system-card.pdf">https://cdn.openai.com/papers/gpt-4-system-card.pdf</ext-link>
          </comment>
        </nlm-citation>
      </ref>
      <ref id="ref3">
        <label>3</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Stanovich</surname>
              <given-names>KE</given-names>
            </name>
            <name name-style="western">
              <surname>West</surname>
              <given-names>RF</given-names>
            </name>
          </person-group>
          <article-title>Individual differences in reasoning: implications for the rationality debate?</article-title>
          <source>Behav Brain Sci</source>
          <year>2000</year>
          <month>10</month>
          <volume>23</volume>
          <issue>5</issue>
          <fpage>645</fpage>
          <lpage>65; discussion 665</lpage>
          <pub-id pub-id-type="doi">10.1017/s0140525x00003435</pub-id>
          <pub-id pub-id-type="medline">11301544</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref4">
        <label>4</label>
        <nlm-citation citation-type="book">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Kahneman</surname>
              <given-names>D</given-names>
            </name>
          </person-group>
          <source>Thinking, Fast and Slow</source>
          <year>2011</year>
          <publisher-loc>New York</publisher-loc>
          <publisher-name>Farrar, Straus and Giroux</publisher-name>
        </nlm-citation>
      </ref>
    </ref-list>
  </back>
</article>
