This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
The Centers for Disease Control and Prevention (CDC) is a national public health protection agency in the United States. With the escalating impact of the COVID-19 pandemic on society in the United States and around the world, the CDC has become one of the focal points of public discussion.
This study aims to identify the topics and their overarching themes emerging from the public COVID-19-related discussion about the CDC on Twitter and to further provide insight into public's concerns, focus of attention, perception of the CDC's current performance, and expectations from the CDC.
Tweets were downloaded from a large-scale COVID-19 Twitter chatter data set from March 11, 2020, when the World Health Organization declared COVID-19 a pandemic, to August 14, 2020. We used R (The R Foundation) to clean the tweets and retain tweets that contained any of five specific keywords—cdc, CDC, centers for disease control and prevention, CDCgov, and cdcgov—while eliminating all 91 tweets posted by the CDC itself. The final data set included in the analysis consisted of 290,764 unique tweets from 152,314 different users. We used R to perform the latent Dirichlet allocation algorithm for topic modeling.
The Twitter data generated 16 topics that the public linked to the CDC when they talked about COVID-19. Among the topics, the most discussed was COVID-19 death counts, accounting for 12.16% (n=35,347) of the total 290,764 tweets in the analysis, followed by general opinions about the credibility of the CDC and other authorities and the CDC's COVID-19 guidelines, with over 20,000 tweets for each. The 16 topics fell into four overarching themes: knowing the virus and the situation, policy and government actions, response guidelines, and general opinion about credibility.
Social media platforms, such as Twitter, provide valuable databases for public opinion. In a protracted pandemic, such as COVID-19, quickly and efficiently identifying the topics within the public discussion on Twitter would help public health agencies improve the next-round communication with the public.
Since first identified in Wuhan, China, in December 2019, COVID-19 has spread rapidly around the world. On March 11, 2020, the World Health Organization (WHO) declared the coronavirus outbreak a pandemic and was unable to determine the duration of the pandemic [
The Centers for Disease Control and Prevention (CDC) is the “nation’s health protection agency, working 24/7 to protect America from health and safety threats” [
Social media platforms such as Twitter have not only become increasingly important for the public to seek, share, and discuss information, but have also provided valuable platforms for the surveillance of public opinion, allowing for the monitoring of the public’s questions, concerns, and responses to health threats [
The ongoing COVID-19 pandemic demands continuous and evolving efforts of using social media data to understand the public’s thoughts and concerns. While a series of studies made significant contributions along these lines, the majority of existing studies analyzed data collected before the middle of April 2020 [
The IDs from a total of 128,432,021 tweets, without retweets, from March 11 through August 14, 2020, were obtained using the data set maintained by Georgia State University’s Panacea Lab [
During the tokenizing stage, we used the
We converted all the tweets to lowercase and created a script to remove the URLs, mentioned names, non-ASCII (American Standard Code for Information Interchange) characters such as emojis, and anything other than English letters or spaces (eg, “1,” “?,” etc). Using the R package
Flowchart for data extraction and preprocessing. CDC: Centers for Disease Control and Prevention.
Topic modeling provides an automatic, or unsupervised, way of summarizing a large collection of documents. It can help discover hidden themes in the collection, group documents into the discovered themes, and summarize the documents by topic. Topic modeling is often referred to as
To extract common topics from this sheer number of tweets, we used the LDA algorithm for topic modeling. We performed the LDA algorithm on the data using the R
Weekly frequency of each topic on Twitter, from March 11 to August 14, 2020.
The Twitter data generated 16 topics that the public linked to the CDC when they talked about COVID-19.
COVID-19-related tweets about the Centers for Disease Control and Prevention (CDC) by topic.
Topic No. | Terms contributing to topic model | Total tweets (N=290,764), n (%) | Description | Examples of representative tweets (date posted) |
1 | Death, report, count, week, numb, total, datum, and pneumonia | 35,347 (12.16) | Discussion of COVID-19 death counting, focusing on whether the accounting of COVID-19 cases is accurate | “Actually, they aren’t even lying about it. The CDC was frank about changing their accounting of COVID-19 cases to encompass contacts with presumptive cases, statistically causing this sudden huge case explosion. [link]” (July 13, 2020) |
2 | Lie, Fauci, people, Trump, trust, doctor, listen, and vaccine | 26,026 (8.95) | General opinions about credibility of the CDC and other authorities | “Coronavirus is not political. the curve has not flattened and people are still dying. Trump and our leaders need to listen to scientists and the CDC. You are, most definitely, a cultist.” (July 10, 2020)(July 28, 2020) |
3 | Guideline, follow, follow_guideline, people, stay, recommendation, social, and distance | 20,032 (6.89) | CDC’s COVID-19 guidelines, with a considerable number of tweets about the social distancing and stay-at-home orders | “Bottom line: the more adherent we are to the CDC guidelines, the faster the economy will recover. Stop gathering, put masks on and we will get out of this. Labor secretary: discipline now in following COVID-19 guidelines will end economic slump 'quickly'. [link]” (April 4, 2020) |
4 | Mask, wear, wear_mask, spread, recommend, people, cloth, and public | 19,934 (6.86) | CDC’s recommendation of wearing face masks to slow the spread of virus | “@[tag] @[tag] @[tag] well, here's the us’ CDC giving guidelines on how to convert a bandana or even a t-shirt to a face covering. gl! [link]“ (April 11, 2020) |
5 | Datum, Trump, hospital, administration send, report, Trump_administration, and control | 19,098 (6.57) | Commenting on COVID-19 data reporting being routed away from the CDC | “@[tag] let's make sure coronavirus reporting isn't politicized. If data isn't going to the CDC it needs to be transparent. What a crazy move in the middle of a pandemic!” (July 15, 2020) |
6 | Test, positive, antibody, test_positive, result, virus, kit, and antibody_test | 18,937 (6.51) | Tweets focusing on the accuracy and inaccuracy of antibody tests | “@[tag] @[tag] CDC? the agency that used COVID19 tainted tests in the beginning of February? Where did CDC get the tests?” (April 22, 2020) |
7 | Trump, pandemic, response, fund, test, cut, president, and administration | 18,888 (6.50) | Commenting on the Trump administration’s health care policies, focusing on cutting CDC funding and dismantling the pandemic response team | “@[tag] @[tag] President Obama created the best pandemic unit in the world. The whole world looked at it as shining example. That pandemic teams' purpose was to secure preparedness for the USA! Republicans fired them in 2018, cut funds for CDC and gave tax breaks to rich!” (March 16, 2020) |
8 | Symptom, read, update, pandemic, virus, outbreak, guidance, and list | 18,326 (6.30) | CDC adding new symptoms of COVID-19 to its list: six symptoms in April and three in June 2020 | “The us centers for disease control and prevention has added six new symptoms of COVID-19 to its list: chills, repeated shaking, muscle pain, headache, sore throat, new loss of taste or smell.” (April 28, 2020) |
9 | School, reopen, child, risk, guideline, kid, guidance, and report | 18,201 (6.26) | Worries about reopening schools, and discussion of children’s risk of COVID-19 | “Exactly why we shouldn't be opening schools before they can meet the CDC requirements. [link]” (July 18, 2020) |
10 | Flu, people, die, death, million, American, vaccine, and month | 16,950 (5.83) | Comparing the number of COVID-19 deaths with the number of flu deaths | “COVID-19 has led to more than 454,000 illnesses and more than 20,550 deaths worldwide. In the US alone, the flu (also called influenza) has caused an estimated 38 million illnesses, 390,000 hospitalizations and 23,000 deaths this season, according to the CDC.” (April 11, 2020) |
11 | Spread, virus, China, surface, travel, easily, Wuhan, and January | 15,223 (5.24) | A wide range of discussion surrounding the spread of COVID-19: China’s warning of human-to-human transmission in January 2020, travel restrictions, and surface spread | “Coronavirus ‘does not spread easily’ by touching surfaces or objects, CDC says. But it still ‘may be possible.’ [link] via @[tag] possible but not likely - time to get over it. like any germ.” (May 21, 2020) |
12 | Health, public, datum, public_health, government, report, and agency | 15,010 (5.16) | Tweets discussing the CDC not leading the control over COVID-19 | “With the white house now having all COVID-19 data and not allowing the CDC to monitor it, the information will no longer be provided to the public. This is a scary time where the government will no longer tell us about the pandemic.” (July 16, 2020) |
13 | Rate, news, death_rate, death, estimate, low, infection, and report | 13,963 (4.80) | Death rate of COVID-19, with a considerable number of tweets emphasizing the low death rate | “Best estimate is 0.4 percent death rate for COVID-19 patients with symptoms: CDC [link]” (May 22, 2020) |
14 | Director, Dr, Redfield, warn, Fauci, wave, bad, and Robert | 13,393 (4.61) | Quoting CDC director Dr Robert Redfield’s statements, especially the warning of a deadlier second wave | “CDC director warns that COVID-19 could return in winter combined with flu in deadlier second wave - Q13 Fox News [link]” (April 22, 2020) |
15 | Home, nurse, patient, health, care, nurse_home, hospital, and official | 10,906 (3.75) | Comments on the practice of sending COVID-19 patients to nursing homes | “#reopennj #openamericanow #trump2020 Murphy administration ignored advice and sent COVID-19 patients to nursing homes | mulshine [link]” (May 22, 2020) |
16 | House, white, White_House, force, task, task_force, Trump, and CNN | 10,530 (3.62) | Tension between the White House and the CDC, focusing on CDC being sidelined in COVID-19 fight | “The White House’s coronavirus task force response coordinator, Deborah Birx, said in a recent meeting that ‘there is nothing from the CDC that I can trust,’ the Washington Post reported. Surprised?” (May 12, 2020) |
Based on the number of tweets, the most tweeted theme was knowing the virus and situation (132,139/290,764, 45.45%). This theme consisted of seven topics: discussion of COVID-19 death counting (35,347/290,764, 12.16%), accuracy of antibody test (18,937/290,764, 6.51%), new symptoms added to the list of COVID-19 (18,326/290,764, 6.30%), number of COVID-19 deaths (16,950/290,764, 5.83%), spread of COVID-19 (15,223/290,764, 5.24%), death rate of COVID-19 (13,963/290,764, 4.80%), and the CDC director Dr Robert Redfield's warning of a deadlier second wave (13,393/290,764, 4.61%). This theme reflected the public's desire to know the virus, such as how it spreads, symptoms of infection, the risk of death, and the situation, such as whether the current response is accurate and effective and how the situation will change. COVID-19 death–related discussion, including death counting, death number, and death rate, dominated this theme.
The second most tweeted theme was policy and government actions (92,633/290,764, 31.86%). This theme consisted of six topics: commenting on COVID-19 data reporting being routed away from the CDC (19,098/290,764, 6.57%), the Trump administration's health care policies (18,888/290,764, 6.50%), the policy of reopening schools (18,201/290,764, 6.26%), the CDC not leading the control over COVID-19 (15,010/290,764, 5.16%), the practice of sending COVID-19 patients to nursing homes (10,906/290,764, 3.75%), and the tension between the White House and the CDC (10,530/290,764, 3.62%). Tweets under this theme featured comments that challenged the government’s actions and policies. Many tweets mentioned that the dismissal of the pandemic response team in 2018 and cutting the CDC’s funding weakened the CDC during the COVID-19 pandemic. When the government announced on July 15, 2020, that COVID-19 hospital data would not be reported to the CDC, the number of tweets related to the topic of the CDC not leading the control over COVID-19 for a single day and a single week both set the record (5954 tweets on July 15, 2020, and 13,392 tweets in the week starting on July 15, 2020; see
The third most tweeted theme was response guidelines (39,966/290,764, 13.75%), which was about how to respond to COVID-19. This theme covered two topics: the CDC's COVID-19 guidelines with a focus on social distancing and stay-at-home orders (20,032/290,764, 6.89%) and the CDC's recommendation of wearing face masks (19,934/290,764, 6.86%). Both of these topics were highly discussed on Twitter, ranking third and fourth, respectively, according to the number of tweets for a single topic. Most of the tweets under this theme suggested that the CDC's guidelines for individuals, businesses, and other organizations should be followed. Many tweets provided the CDC links to the public for further details; one of the most common CDC links mentioned by Twitter users was the video tutorial released by the CDC about making cloth masks.
The topic of general opinion about credibility of the CDC and other authorities in charge, such as Dr Fauci, Dr Birx, President Donald Trump, the Food and Drug Administration, and the WHO, stands alone as a category of themes, being the fourth most tweeted theme (26,026/290,764, 8.95%) and the second most tweeted topic, trailing only behind the topic of COVID-19 death counting. Different from the other topics, the tweets under this topic did not point to one or a few specific things; instead they usually expressed general opinions and sometimes together with emotions. Words reflecting “credibility,” such as “lie,” “trust,” “listen,” “hoax,” “conspiracy,” “stupidity,” and “fail,” were frequently used by Twitter users. However, it was noted that the negative words did not always point to the CDC; instead, there were a substantial number of tweets grouped under this same theme asking people to stand with the CDC and listen to the scientists (see the representative tweets of this topic in
Revealed by the quantity of tweets, the public's most prominent concern was death, with over 22.79% of tweets relating to death-related discussion. Previous infoveillance studies of Twitter data in the early period of COVID-19 found that 4.34% of tweets were about death reporting [
The majority of the public discussion involved how to act during the COVID-19 pandemic. This echoed past crisis research: in risky environments, the first information that should be conveyed to the public is the information that instructs the public on how to protect themselves in the threatened environments [
As to the CDC's performance in the COVID-19 response, the public expressed mixed comments. One factor contributing to this may be that the CDC has not played a central role in controlling the pandemic; this deviated from what the CDC had done historically during epidemics [
There are a few limitations to this study. First of all, tweets from accounts marked as private might be missed in the data collection, and tweets generated by bots or fake accounts might not have been filtered. Second, this study identified topics from the public discussion about the CDC but did not examine the temporal variance of topics. Although this is not in our research scope, it may deepen our understanding of how the public changed their focus as time and specific situations during that time changed. Therefore, we highly suggest that future studies put emphasis on the temporal dimension of online public discussion about COVID-19 to get more insight into the formation and variation of the discussion topics. Third, this study did not investigate the public’s emotions shown in the tweets, which is an important dimension of the public discussion. Future research in this line of study may shed light on the public's affective response to the CDC's actions and may inform the CDC about the public's emotions to be addressed during the pandemic. Lastly, Twitter users do not represent the US population [
In public health crises, social media platforms, such as Twitter, can provide valuable databases for public health agencies to understand the public's concerns, focus of attention, and expectations. The ability of text mining to derive high-quality information from massive data sets is ideal for performing surveillance work. Especially in a protracted pandemic such as COVID-19, quickly and efficiently identifying the topics within the public discussion on Twitter would provide insight for the next round of public health communication in order to mitigate public concerns and avoid the spread of misinformation.
American Standard Code for Information Interchange
Centers for Disease Control and Prevention
graphical user interface
JavaScript Object Notation
latent Dirichlet allocation
World Health Organization
This study was supported by the National Cancer Institute Grant T32 CA 113710.
None declared.