<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "http://dtd.nlm.nih.gov/publishing/2.0/journalpublishing.dtd">
<?covid-19-tdm?>
<article xmlns:xlink="http://www.w3.org/1999/xlink" article-type="letter" dtd-version="2.0">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JMIR</journal-id>
      <journal-id journal-id-type="nlm-ta">J Med Internet Res</journal-id>
      <journal-title>Journal of Medical Internet Research</journal-title>
      <issn pub-type="epub">1438-8871</issn>
      <publisher>
        <publisher-name>JMIR Publications</publisher-name>
        <publisher-loc>Toronto, Canada</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">v24i2e31978</article-id>
      <article-id pub-id-type="pmid">35195531</article-id>
      <article-id pub-id-type="doi">10.2196/31978</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Letter to the Editor</subject>
        </subj-group>
        <subj-group subj-group-type="article-type">
          <subject>Letter to the Editor</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Getting a Vaccine, Jab, or Vax Is More Than a Regular Expression. Comment on “COVID-19 Vaccine-Related Discussion on Twitter: Topic Modeling and Sentiment Analysis”</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="editor">
          <name>
            <surname>Leung</surname>
            <given-names>Tiffany</given-names>
          </name>
        </contrib>
      </contrib-group>
      <contrib-group>
        <contrib id="contrib1" contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Cummins</surname>
            <given-names>Jack Alexander</given-names>
          </name>
          <xref rid="aff1" ref-type="aff">1</xref>
          <address>
            <institution>Manchester Essex Regional High School</institution>
            <addr-line>36 Lincoln Street</addr-line>
            <addr-line>Manchester, MA, 01944</addr-line>
            <country>United States</country>
            <phone>1 9788101169</phone>
            <email>2jackcummins@gmail.com</email>
          </address>
          <ext-link ext-link-type="orcid">https://orcid.org/0000-0003-0978-0421</ext-link>
        </contrib>
      </contrib-group>
      <aff id="aff1">
        <label>1</label>
        <institution>Manchester Essex Regional High School</institution>
        <addr-line>Manchester, MA</addr-line>
        <country>United States</country>
      </aff>
      <author-notes>
        <corresp>Corresponding Author: Jack Alexander Cummins <email>2jackcummins@gmail.com</email></corresp>
      </author-notes>
      <pub-date pub-type="collection">
        <month>2</month>
        <year>2022</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>23</day>
        <month>2</month>
        <year>2022</year>
      </pub-date>
      <volume>24</volume>
      <issue>2</issue>
      <elocation-id>e31978</elocation-id>
      <history>
        <date date-type="received">
          <day>12</day>
          <month>7</month>
          <year>2021</year>
        </date>
        <date date-type="accepted">
          <day>1</day>
          <month>2</month>
          <year>2022</year>
        </date>
      </history>
      <copyright-statement>©Jack Alexander Cummins. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 23.02.2022.</copyright-statement>
      <copyright-year>2022</copyright-year>
      <license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
        <p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.</p>
      </license>
      <self-uri xlink:href="https://www.jmir.org/2022/2/e31978" xlink:type="simple"/>
      <related-article related-article-type="commentary-article" id="v23i6e24435" ext-link-type="doi" xlink:href="10.2196/24435" vol="23" page="e24435" xlink:type="simple">https://www.jmir.org/2021/6/e24435</related-article>
      <kwd-group>
        <kwd>COVID-19</kwd>
        <kwd>vaccine</kwd>
        <kwd>vaccination</kwd>
        <kwd>Twitter</kwd>
        <kwd>infodemiology</kwd>
        <kwd>infoveillance</kwd>
        <kwd>topic</kwd>
        <kwd>sentiment</kwd>
        <kwd>opinion</kwd>
        <kwd>discussion</kwd>
        <kwd>communication</kwd>
        <kwd>social media</kwd>
        <kwd>perception</kwd>
        <kwd>concern</kwd>
        <kwd>emotion</kwd>
        <kwd>natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <p>Lyu et al [<xref ref-type="bibr" rid="ref1">1</xref>] used natural language processing techniques to analyze the topics and sentiment of Twitter conversations related to the COVID-19 vaccine using the Twitter chatter data set created by Georgia State University’s Panacea Lab. Tweets that contained any of the following keywords “vaccination,” “vaccinations,” “vaccine,” “vaccines,” “immunization,” “vaccinate,” and “vaccinated” were selected for further analysis (eg, topic modeling and sentiment analysis).</p>
    <p>I would suggest that the study might be enhanced by including additional keyword searches for vaccination synonyms identified by a method such as the Continuous Bag of Words (CBOW) word2vec model [<xref ref-type="bibr" rid="ref2">2</xref>]. While the study focused on formal language such as “vaccination” and “vaccine,” other colloquial terms are commonly used on Twitter and other social media platforms to describe the vaccination process.</p>
    <p>I identified synonyms for “vaccination” commonly used on Twitter by using the <italic>gensim</italic> implementation of the CBOW word2vec model [<xref ref-type="bibr" rid="ref3">3</xref>]. This model predicts synonyms and related words by creating vector representations of words. Words with similar vector representations are more likely to be synonyms than words with dissimilar vector representations. I trained the CBOW word2vec model on 503,862 tweets containing the keyword “covid” or “corona” from June 24-27, 2021, collected through the <italic>rtweet</italic> package [<xref ref-type="bibr" rid="ref4">4</xref>]. The keyword pattern search results included tweets using words related to COVID-19 such as “covid-19” or “coronavirus.”</p>
    <p>Out of the 503,862 COVID-19–related tweets downloaded with <italic>rtweet</italic>, 94,768 contained at least one of the words searched for in the study by Lyu et al [<xref ref-type="bibr" rid="ref1">1</xref>]. In addition, a total of 22,587 tweets used the terms “shot,” “shots,” “jab,” “jabs,” “jabbed,” “vax,” or “vaxxed.” The words “shot” or “shots” were used in 9017 tweets. The words “jab,” “jabs,” or “jabbed” were used in 7021 tweets. The words “vax” or “vaxxed” were used in 4081 tweets. Out of the 22,587 tweets that contained these alternative terms, 15,855 (70.2%) were tweeted by users who self-disclosed their location on their user profile. Using the Nominatim application programming interface, it was possible to identify geocoded location, including country, for 13,101 of the 15,855 user-disclosed locations [<xref ref-type="bibr" rid="ref5">5</xref>]. Of these 13,101 geocoded tweets, 3111 were from the United Kingdom, of which 2261 used “jab,” “jabbed,” or “jabs.” Among the geocoded tweets, 4910 were from the United States; of these, 2704 included “shot” or “shots” and 1130 used “vax” and “vaxxed.”</p>
    <p>I would propose that researchers performing keyword searches on social media chatter consider using the CBOW word2vec model to enhance their studies by expanding the number of comments they capture and to reduce geographic or population bias that may occur from the preselection of terminology. The CBOW word2vec model can help capture more completely the full range of word choices used by social media users.</p>
  </body>
  <back>
    <app-group/>
    <glossary>
      <title>Abbreviations</title>
      <def-list>
        <def-item>
          <term id="abb1">CBOW</term>
          <def>
            <p>Continuous Bag of Words</p>
          </def>
        </def-item>
      </def-list>
    </glossary>
    <fn-group>
      <fn fn-type="conflict">
        <p>None declared.</p>
      </fn>
      <fn fn-type="other">
        <p>
          <bold>Editorial Notice</bold>
        </p>
        <p>The corresponding author of <italic>"COVID-19 Vaccine–Related Discussion on Twitter: Topic Modeling and Sentiment Analysis"</italic> declined to respond to this letter.</p>
      </fn>
    </fn-group>
    <ref-list>
      <ref id="ref1">
        <label>1</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Lyu</surname>
              <given-names>Joanne Chen</given-names>
            </name>
            <name name-style="western">
              <surname>Han</surname>
              <given-names>Eileen Le</given-names>
            </name>
            <name name-style="western">
              <surname>Luli</surname>
              <given-names>Garving K</given-names>
            </name>
          </person-group>
          <article-title>COVID-19 Vaccine-Related Discussion on Twitter: Topic Modeling and Sentiment Analysis</article-title>
          <source>J Med Internet Res</source>
          <year>2021</year>
          <month>06</month>
          <day>29</day>
          <volume>23</volume>
          <issue>6</issue>
          <fpage>e24435</fpage>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://www.jmir.org/2021/6/e24435/"/>
          </comment>
          <pub-id pub-id-type="doi">10.2196/24435</pub-id>
          <pub-id pub-id-type="medline">34115608</pub-id>
          <pub-id pub-id-type="pii">v23i6e24435</pub-id>
          <pub-id pub-id-type="pmcid">PMC8244724</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref2">
        <label>2</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Mikolov</surname>
              <given-names>T</given-names>
            </name>
            <name name-style="western">
              <surname>Chen</surname>
              <given-names>K</given-names>
            </name>
            <name name-style="western">
              <surname>Corrado</surname>
              <given-names>G</given-names>
            </name>
            <name name-style="western">
              <surname>Dean</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Efficient Estimation of Word Representations in Vector Space</article-title>
          <source>arXiv.</source>
          <comment>Preprint posted online Jan 16, 2013.
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://arxiv.org/abs/1301.3781"/>
          </comment>
        </nlm-citation>
      </ref>
      <ref id="ref3">
        <label>3</label>
        <nlm-citation citation-type="confproc">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Řehůřek</surname>
              <given-names>R</given-names>
            </name>
            <name name-style="western">
              <surname>Sojka</surname>
              <given-names>P</given-names>
            </name>
          </person-group>
          <article-title>Software Framework for Topic Modelling with Large Corpora</article-title>
          <year>2010</year>
          <conf-name>Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</conf-name>
          <conf-date>May 22</conf-date>
          <conf-loc>Valletta, Malta</conf-loc>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://radimrehurek.com/lrec2010_final.pdf"/>
          </comment>
          <pub-id pub-id-type="doi">10.13140/2.1.2393.1847</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref4">
        <label>4</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Kearney</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>rtweet: Collecting and analyzing Twitter data</article-title>
          <source>JOSS</source>
          <year>2019</year>
          <month>10</month>
          <volume>4</volume>
          <issue>42</issue>
          <fpage>1829</fpage>
          <pub-id pub-id-type="doi">10.21105/joss.01829</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref5">
        <label>5</label>
        <nlm-citation citation-type="web">
          <article-title>Nominatim</article-title>
          <source>OpenStreetMap Wiki</source>
          <access-date>2021-07-12</access-date>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://wiki.openstreetmap.org/wiki/Nominatim">https://wiki.openstreetmap.org/wiki/Nominatim</ext-link>
          </comment>
        </nlm-citation>
      </ref>
    </ref-list>
  </back>
</article>
