Creating COVID-19 Stigma by Referencing the Novel Coronavirus as the “Chinese virus” on Twitter: Quantitative Analysis of Social Media Data

Background: Stigma is the deleterious, structural force that devalues members of groups that hold undesirable characteristics. Since stigma is created and reinforced by society—through in-person and online social interactions—referencing the novel coronavirus as the “Chinese virus” or “China virus” has the potential to create and perpetuate stigma. Objective: The aim of this study was to assess if there was an increase in the prevalence and frequency of the phrases “Chinese virus” and “China virus” on Twitter after the March 16, 2020, US presidential reference of this term. Methods: Using the Sysomos software (Sysomos, Inc), we extracted tweets from the United States using a list of keywords that were derivatives of “Chinese virus.” We compared tweets at the national and state levels posted between March 9 and March 15 (preperiod) with those posted between March 19 and March 25 (postperiod). We used Stata 16 (StataCorp) for quantitative analysis, and Python (Python Software Foundation) to plot a state-level heat map. Results: A total of 16,535 “Chinese virus” or “China virus” tweets were identified in the preperiod, and 177,327 tweets were identified in the postperiod, illustrating a nearly ten-fold increase at the national level. All 50 states witnessed an increase in the number of tweets exclusively mentioning “Chinese virus” or “China virus” instead of coronavirus disease (COVID-19) or coronavirus. On average, 0.38 tweets referencing “Chinese virus” or “China virus” were posted per 10,000 people at the state level in the preperiod, and 4.08 of these stigmatizing tweets were posted in the postperiod, also indicating a ten-fold increase. The 5 states with the highest number of postperiod “Chinese virus” tweets were Pennsylvania (n=5249), New York (n=11,754), Florida (n=13,070), Texas (n=14,861), and California (n=19,442). Adjusting for population size, the 5 states with the highest prevalence of postperiod “Chinese virus” tweets were Arizona (5.85), New York (6.04), Florida (6.09), Nevada (7.72), and Wyoming (8.76). The 5 states with the largest increase in preto postperiod “Chinese virus” tweets were Kansas (n=697/58, 1202%), South Dakota (n=185/15, 1233%), Mississippi (n=749/54, 1387%), New Hampshire (n=582/41, 1420%), and Idaho (n=670/46, 1457%). Conclusions: The rise in tweets referencing “Chinese virus” or “China virus,” along with the content of these tweets, indicate that knowledge translation may be occurring online and COVID-19 stigma is likely being perpetuated on Twitter. (J Med Internet Res 2020;22(5):e19301) doi: 10.2196/19301


Introduction
Stigma is the deleterious, structural force that devalues those who hold undesirable characteristics [1]. Stigma is a social process that occurs between groups; this process can occur in-person and online [2][3][4][5][6]. Regardless of setting, research has consistently found that stigma is associated with negative health outcomes [2,4,[6][7][8][9]. For example, HIV-related stigma has pushed the HIV-epidemic underground, fueling ongoing transmission [10], and other disease-related stigmas are associated with negative health outcomes ranging from missed clinical visits to suicidal ideation [1, 6,9]. There is evidence to show that stigma can become internalized, and internalized stigma can lead to distrust of health professionals, skepticism of public health systems, and an unwillingness to disclose behaviors related to transmission [2,8,9]. Because the coronavirus disease (COVID-19) is infectious, contact tracing is critically important to assessing community spread; thus, it is imperative that individuals trust their public health and health care systems so that they are willing to accept testing and, if diagnosed with COVD-19, report their whereabouts and activities. Therefore, creating and perpetuating stigma related to COVID-19 could be detrimental to public health efforts that require potentially stigmatized individuals to engage with their health systems.
On March 16, 2020, the president of the United States referred to the novel coronavirus as the "Chinese virus" on Twitter. He tweeted "The United States will be powerfully supporting those industries... that are particularly affected by the Chinese Virus..." After this presidential reference, a dialogue emerged examining if the phrase "Chinese virus" was xenophobic and stigmatizing, considering the availability of alternative scientific names such as coronavirus or COVID-19. Since stigma is created and perpetuated by society through social interaction and public commentary (eg, use of the term "Chinese virus" instead of scientific terms on Twitter), and stigma is reinforced by those in power (eg, use of the term "Chinese virus" by the US president), we hypothesized that there would be an increase in the frequency of the phrases "Chinese virus" and "China virus" on Twitter, comparing the prevalence of these phrases before and after the presidential reference.

Twitter
Twitter is an online social media platform where users send and receive short posts (maximum 280 characters) called tweets. Twitter currently has 152 million daily users, who produce about 500 million daily tweets [11].

Data, Tweets
We downloaded tweets from all 50 US states, using the Sysomos software (Sysomos, Inc). We extracted tweets that mentioned "Chinese virus" or "China virus" but did not contain "COVID-19" or "coronavirus." The list of keywords referencing the "Chinese virus" are "Chinesevirus," "Chinese virus," "Chinavirus," "China virus," "#ChineseVirus19," "#Chinesevirus," "#ChineseVirusCorona," and "#Chinavirus." We excluded tweets containing the keywords "coronavirus," "corona virus," "COVID-19," "COVID19," "#COVID2019," and "#corona." By excluding tweets that contained both "Chinese virus" and "coronavirus," we collated a sample of tweets that represented the intent of using "Chinese virus" in place of a scientific alternative, likely indicating deliberate stigmatization. We imputed the location of tweets based on Twitter users' self-reported state of residence. Tweets posted between March 9 and March 15, 2020 (preperiod), were compared with tweets posted between March 19 and March 25, 2020 (postperiod). Original tweets and quote tweets (adding comments to an existing tweet) were included but not retweets (reposting of an existing tweet). Our final sample (N=193,862) contained all tweets posted in the pre-and postperiods by US-based Twitter users that exclusively mentioned a derivative of "Chinese virus." Data extraction was conducted on April 10, 2020. Ethical approval was provided by the University of Alabama at Birmingham Institutional Review Board (IRB-#300005071).

Analysis
We used Stata 16 (StataCorp) to analyze our Twitter data and Python software (Python Software Foundation) to plot our state-level gradient heat map.

Results
A total of 16,535 "Chinese virus" or "China virus" tweets were identified in the preperiod, and 177,327 tweets were identified in the postperiod, illustrating a 972.43% (n=160,792/16,535) increase. Comparatively, the number of tweets referencing COVID-19 in the preperiod and postperiod remained steady, at about 4.9 million tweets per period. A total of 13,569 (82.06%) of the preperiod and 145,521 (82.06%) of the postperiod tweets were associated with a Twitter user's self-reported US state. Figure 1 is a heat map illustrating the state-by-state increases of tweets referencing "Chinese virus" or "China virus." The darker the shade, the greater the increase. All 50 US states witnessed an increase in the number of tweets exclusively mentioning "Chinese virus" or "China virus" rather than COVID-19 or coronavirus. The 5 US states with the highest number of postperiod "Chinese virus" tweets were Pennsylvania, New York, Florida, Texas, and California. The 5 US states with the largest increase in pre-to postperiod "Chinese virus" tweets were Kansas, South Dakota, Mississippi, New Hampshire, and Idaho. In Table 1, we present US state-level results of tweets referencing "Chinese virus" or "China virus." On average, at the state level, 271 such tweets were found in the preperiod and 2910 in the postperiod, indicating a ten-fold increase, similar to what we found at the national level. We also calculated the percentage increase and the prevalence increase. The percentage increase measures the percentage of all COVID-19 related tweets that mentioned "China virus" or "Chinese virus" exclusively. To account for variations in population size, prevalence of "Chinese virus" tweets per 10,000 people for each US state was calculated using the following formula: . State population sizes were taken from the 2019 US Census Bureau estimates [12]. On average, the state-level percentage increase was 997%, with a minimum of 661% and a maximum of 1447%. Similarly, the prevalence increase mean was 1015%, with a minimum of 734% and a maximum of 1456%. Large variations were found across US states, with the lowest postperiod prevalence of "Chinese virus" or "China virus" in South Dakota and the highest in Wyoming. The 5 US states with the highest prevalence of "Chinese virus" or "China virus" postperiod tweets were Arizona, New York, Florida, Nevada, and Wyoming.

Principal Result
We found notable increases in the use of the terms "Chinese virus" and "China virus" on Twitter at both the national and state levels by comparing these tweets (percentage and prevalence) both before and after the March 16, 2020, presidential reference.

Limitations
The pandemic is currently underway, so Twitter data-both in quantity (quantitative) and content (qualitative)-are rapidly shifting. We were unable to screen for automatically generated tweets (bots) within this short report [13,14]. Geographic locations associated with Twitter accounts were self-reported; thus, it is possible that some Twitter users may have moved without updating their state location or may have reported a false state location.

Comparison With Prior Work
There is a growing body of academic literature that leverages Twitter data to assess trends in population health and public sentiment [15][16][17]. Chew and Eysenbach [18] conducted a seminal examination of knowledge translation using Twitter data during the H1N1 outbreak; they found the proportion of tweets using "H1N1" increased over time compared to the relative use of "swine flu," suggesting that the media's choice in terminology (shifting from using the term "swine flu" to "H1N1") influenced public uptake. In addition, it is relevant that a recent publication by Logie and Turan [19] presented a narrative on how stigma can hurt the COVID-19 public health response. This short report was developed considering the findings from prior studies.

Future Research
Future research could evaluate and show that stigma mechanisms work online, validate if Twitter and social media data can be informative to epidemic surveillance and health communication, examine the extent that Twitter and social media data is reliable in informing public health efforts and social science research, and explore how Twitter users view COVID-19 and the COVID-19 public health response (eg, testing, linkage to care).
Additionally, although there is a growing body of research using tweets to examine aspects of the novel coronavirus [20][21][22], to our knowledge, no studies have included a comprehensive set of search terms, which may include phrases such as "ncov," "covid," "sars-cov," and "rona," in defining their samples. If data extraction is not comprehensive, we run the risk of missing emerging sentiments and terminology, such as referencing the novel coronavirus as the "China virus" or "Chinese virus," and sociobehavioral outcomes related to these trends.

Conclusions
The rise in tweets citing "Chinese virus" or "China virus" instead of COVID-19 or the novel coronavirus after the presidential reference on Twitter, along with the content of these tweets, indicate that knowledge translation may be occurring online and COVID-19 stigma is likely being perpetuated on Twitter. Generally speaking, perpetuating COVID-19-related stigma by using the phrase "Chinese virus" could harm public health efforts related to addressing the pandemic, specifically inciting fear and increasing distrust of public health systems by Chinese and Asian Americans. If these stigmatizing terms persist as malicious synonyms for the novel coronavirus, reparative efforts may be required to restore trust by marginalized communities.