Published on in Vol 23, No 5 (2021): May

Preprints (earlier versions) of this paper are available at, first published .
Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study

Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study

Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study

Original Paper

1Analytics, Intelligence, and Technology, Los Alamos National Laboratory, Los Alamos, NM, United States

2Computer Science, University of New Mexico, Albuquerque, NM, United States

Corresponding Author:

Ashlynn R Daughton, MPH, PhD

Analytics, Intelligence, and Technology

Los Alamos National Laboratory

P.O. Box 1663

Los Alamos, NM, 87545

United States

Phone: 1 505 664 0062


Background: Health authorities can minimize the impact of an emergent infectious disease outbreak through effective and timely risk communication, which can build trust and adherence to subsequent behavioral messaging. Monitoring the psychological impacts of an outbreak, as well as public adherence to such messaging, is also important for minimizing long-term effects of an outbreak.

Objective: We used social media data from Twitter to identify human behaviors relevant to COVID-19 transmission, as well as the perceived impacts of COVID-19 on individuals, as a first step toward real-time monitoring of public perceptions to inform public health communications.

Methods: We developed a coding schema for 6 categories and 11 subcategories, which included both a wide number of behaviors as well codes focused on the impacts of the pandemic (eg, economic and mental health impacts). We used this to develop training data and develop supervised learning classifiers for classes with sufficient labels. Classifiers that performed adequately were applied to our remaining corpus, and temporal and geospatial trends were assessed. We compared the classified patterns to ground truth mobility data and actual COVID-19 confirmed cases to assess the signal achieved here.

Results: We applied our labeling schema to approximately 7200 tweets. The worst-performing classifiers had F1 scores of only 0.18 to 0.28 when trying to identify tweets about monitoring symptoms and testing. Classifiers about social distancing, however, were much stronger, with F1 scores of 0.64 to 0.66. We applied the social distancing classifiers to over 228 million tweets. We showed temporal patterns consistent with real-world events, and we showed correlations of up to –0.5 between social distancing signals on Twitter and ground truth mobility throughout the United States.

Conclusions: Behaviors discussed on Twitter are exceptionally varied. Twitter can provide useful information for parameterizing models that incorporate human behavior, as well as for informing public health communication strategies by describing awareness of and compliance with suggested behaviors.

J Med Internet Res 2021;23(5):e27059



Health authorities can minimize the impact of an emergent infectious disease through effective and timely risk communication, vaccines and antiviral therapies, and the promotion of health behaviors, such as social distancing and personal hygiene practices [1-4]. Of these, official communication is the earliest available strategy, and its effectiveness will build trust and adherence to the remaining measures [1]. During the H1N1 influenza pandemic in 2009, most countries focused on the promotion of health behaviors [2] such as mask-wearing, avoidance of crowds, and increased disinfection after observing that such protocols contributed substantially to reduced transmission and ultimate control of disease during the SARS outbreak in 2003 [5]. Health authorities have paid less attention to the psychological factors associated with a pandemic [3,4], though such factors play a vital role in subsequent adherence to health behaviors and vaccine uptake [1]. During the emergence of the Zika virus in 2016, public health guidelines focused on preventing sexual transmission by using condoms, avoiding travel to locations with active Zika transmission, and mosquito control [6], with varying levels of compliance [7,8].

Research into the use of social media and internet data for health surveillance is a growing field. Individuals discuss a wide variety of health concerns and health behaviors online, from symptom searching [9] and personal experiences with infectious diseases [2] to dieting [10] and electronic cigarette use [11]. These data have been used to identify prominent points of discussion in relation to health topics [12-14], which can point toward more effective health policies and interventions. In addition, social media and internet data reflect temporal and spatial patterns in health behavior [9-12,15]. The association between internet data and health behavior, topics, and attitudes relevant to the public provides insight into the manner in which individuals receive health information and how that information may translate into behavioral change. Specifically for disease outbreaks, internet and social media data provide opportunities for public health officials to monitor prevalent attitudes and behaviors at a given time to target further interventions and policies.

In this work, we used social media data to better understand human behaviors relevant to COVID-19 transmission and the perceived impacts of COVID-19 on individuals. We developed a coding schema for 6 categories and 11 subcategories, which included both a wide number of behaviors as well as codes focused on the impacts of the pandemic (eg, economic and mental health impacts). We applied this schema to approximately 7200 tweets and developed supervised learning classifiers for classes with sufficient labels. We then applied these classifiers to an extensive Twitter data set and showed patterns in human behaviors temporally and spatially across the United States.

We specifically focused on the following research questions:

  1. Research Question 1: What behaviors related to COVID-19 are discussed on social media websites, specifically Twitter? Using content analysis techniques similar to other social media studies (eg, Ramanadhan et al [16] and Carrotte et al [17]), we identified behaviors discussed on Twitter that could be relevant to disease transmission or the downstream impacts of COVID-19. At the outset, we were particularly interested in social distancing, hygiene, and personal protective equipment practices, but we were also interested in identifying the breadth of behaviors that might be discussed.
  2. Research Question 2: How do patterns in behaviors change geospatially and temporally in the United States? Using labeled data from Research Question 1, we built classification models to identify behaviors in the larger Twitter corpus. We were interested in temporal and geospatial trends in these classified data with the goal of observing regional patterns and temporal changes that occurred in conjunction with real-world events. Prior work has used similar methods to observe patterns during Zika emergence in 2016 [15].
  3. Research Question 3: How do these trends compare to other data streams, like mobility data sets? Prior work has shown that social media data are biased in multiple ways [18,19]. One way to validate our findings is to compare results using social media data to other data sources that have been useful to measure human behavior during the COVID-19 pandemic. In particular, several studies have shown that mobility data sets that rely on mobile devices (eg, smartphones) have been useful at accurately gauging reduced mobility [20,21].


For this work, we used a data set of tweets provided by Chen et al [22]. Data collection started on January 28, 2020, and used Twitter’s search application programming interface (API) to get historical tweets as early as January 21, 2020. They started with 14 keywords related to the novel coronavirus, and later expanded both keywords and individual accounts tracked over time. The data relied on Twitter’s streaming API, and are thus a 1% sample of tweets that include the keywords. The original repository contained about 270 million tweets as of mid-July 2020 [22]. Of these, we were able to collect 84% (N=228,030,309).

Schema Development

The coding schema was developed by three of the authors (AD, DG, and CS) through iterative analysis of random samples of tweets from our corpus. We started initially with categories of interest (eg, social distancing and personal protective equipment) and added both categories and subcategories as they were identified in tweets, similar to prior work [16,17]. The final schema is hierarchical, where annotators can label categories and, if applicable, subcategories within the category of interest (Figure 1).

Figure 1. Decision tree for labeling.
View this figure

Personal and impersonal viewpoints were labeled separately from the tweet category. Here, a personal viewpoint is a tweet that describes a direct observation of the behavior, meaning the individual tweeting talks about their own behavior, or a person or event that the user can directly observe. For example, the tweet “I am wearing a mask when I go out” is a personal mention of personal protective equipment, specifically mask-wearing. An impersonal viewpoint, in contrast, includes actions like sharing articles, retweeting, or expressing an opinion without providing evidence that the user themself engages in the behavior (eg, “Ugh, I wish more people wore masks”). This definition is the same as prior work [15]. Of note, tweets were only labeled as personal or impersonal if they were already labeled with a category. Tweets that were outside the labels of interest were not labeled for viewpoint.

Training Data

Training Annotators and Annotation

To create our training data set, 7278 tweets were selected at random from the English tweets we collected between January and May 2020, as labeling commenced in May. Using the above schema, we then trained three additional annotators. Annotators were trained using the following steps. First, a member of the team (AD) met with each individual prospective annotator and thoroughly described the schema. The prospective annotator and AD first labeled 16 example tweets together using tweets already labeled during schema development. The annotator then individually coded 160 additional tweets previously labeled by the authors. If agreement was sufficiently high (>0.6), the annotator was then given their own section of training data to code. Each tweet in our training data set was coded by two such annotators. All annotators met weekly to discuss questions about labels. All tweets with disagreements were resolved by a third annotator or via group discussion. The workflow to label tweets is given in Figure 1. Tweets can be labeled with more than one label, as applicable.

Annotator Agreement

Annotator agreement varied. Personal and impersonal labels had agreements of 0.41, 0.44, and 0.41 between the three pairs of annotators. Category-level labels had agreements of 0.77, 0.82, and 0.82, and subcategory-level labels had agreements of 0.61, 0.65, and 0.66. Distinguishing between personal and impersonal tweets was the hardest classification task because it is inherently difficult to correctly identify voice in the span of 280 characters, especially without additional context. Prior work has relied on the use of personal pronouns (eg, “I,” “we,” and “our”) to identify personal tweets [15], but it is clear that this method has a high false negative rate because of linguistic patterns like pronoun-drop (eg, the tweet “Went to the store today and nobody was wearing masks” drops the pronoun “I” and leaves it implied) [23]. Thus, despite the difficulty in labeling these tweets, we believe it is preferable to automated methods.

Classification Algorithms

Tweet Preprocessing

Tweet URLs and usernames (@-mentions) were replaced with the tokens “URL” and “USERNAME,” respectively. Consecutive characters were truncated (eg, “greaaaaaat” was truncated to “great”) and punctuation was removed. Of the training data, 15% were reserved as the test set. Tweets were split using stratified sampling based on the category labels to preserve label proportions. Because of the small number of labels in several categories (Table 1), we only attempted to make classifiers for the following: personal or impersonal, social distancing (category), shelter-in-place (subcategory), monitoring (category), hygiene (category), and personal protective equipment (category).

Because personal and impersonal labels were only assigned to tweets if they fell into a category, the training data for this classifier were only those tweets with an initial label. In contrast, all other classifiers used binary classification and included all tweets that did not include the label of interest, including tweets with no labels. As such, all classification models were built using extremely disproportionate label distributions.

Logistic Regression

Logistic regression models were implemented in Python, version 3.7.7 (Python Software Foundation), using scikit-learn [24] and the elastic net penalty. Features included all unigrams, bigrams, and trigrams of tweet text. To optimize models, grid search was used with all possible combinations of the following parameters: the elastic net penalty varied the L1 ratio from 0 (equivalent to only “L2” penalty) to 1 (equivalent to only “L1” penalty), regularization strength varied in order of magnitude from 0.001 to 1000, and chi-square feature selection was varied from 10% to 100% of the features (ie, no feature selection), in steps of 10%, to explore the impact of feature reduction on model performance.

Random Forest

Random forest models were implemented using scikit-learn’s random forest classifier [24]. As in logistic regression, features included all unigrams, bigrams, and trigrams of tweet text. Again, grid search was used to optimize models. The minimum number of samples per leaf node was varied from 2 to 11 (in steps of 3), the minimum number of samples required to split an internal node ranged from 2 to 52 (in steps of 10), and the number of trees per forest was either 50 or 100. Last, we additionally varied the number of features. Because of the larger number of parameters tested here, we tried feature selections of 25%, 50%, 75%, or 100% of features (ie, no feature selection).

Classification and Bias Adjustments

Both types of models performed poorly for classifying monitoring, personal protective equipment, and hygiene. As such, we did not use these models for downstream analysis. Rather, we focus on the personal or impersonal model, the social distancing classifier, and the shelter-in-place classifier.

Though random forest models sometimes produced slightly higher F1 scores, we used the logistic regression models for overall classification and downstream analysis because of the slightly higher precision values. Said another way, in this context, we preferred fewer false positives to slightly more false negatives because we were trying to identify a particular behavior and wanted as few erroneous predictions included in the classifier as possible.

To combat the bias inherent in our classifiers, as it is clear that misclassification will occur, we used the method suggested by Daughton and Paul [25] to create confidence intervals that account for classifier error. The basic principle is to use bootstrapping to generate many samples and to subsequently weight individual classifications by the positive predictive value or negative predictive value of the classifier. The bootstrapped samples are then used to generate a 95% confidence interval around the point estimate (see Daughton and Paul [25] for full details). This method has been successfully applied in similar work focused on identifying travel change behaviors in response to Zika [15]. For this work, we used 100 bootstrapped samples to generate daily confidence intervals.

Geospatial Analysis and Comparison to Mobility and COVID-19 Data

We compared the results of our classifiers to mobility data from Descartes Labs—available at Descartes Labs [26] and described in Warren and Skillman [27]—to provide a ground truth measurement of social distancing, and to the number of confirmed COVID-19 cases in each state, as tracked by The New York Times [28]. The mobility data used geolocation services from mobile devices (eg, smartphones) to generate aggregate estimates about mobility within specific geographic areas. Descartes Labs provides data at admin level 1 (state) mobility and admin level 2 (county) mobility [26]. For this work, we only consider state mobility. Descartes Labs uses a normalized value of the median maximum distance traveled each day: the m50 index. Here, data are normalized using the median mobility per state between February 27 and March 7 (ie, a pre–COVID-19 window). For this work, we looked at the percent change in mobility (m50 index – 100) [27], which can be interpreted as the percent change in mobility relative to the baseline period.

We used these data as a ground truth data set to validate social media tweets about social distancing and sheltering-in-place. For these comparisons, we restricted our data to those with geolocation services enabled (ie, those that used the tweet “place” to determine location), which we then aggregated by state. Here, data were aggregated to weekly data, and any weeks with fewer than 50 tweets were removed. States with fewer than 10 data points were excluded from visualization.

Content Analysis and Labels

In total, 7278 tweets were read and labeled. Of these, 2202 tweets fell into the categories shown in Table 1. For each category and subcategory, the definition and an example anonymized tweet is shown. The most prevalent category by far was tweets about social distancing. Of these tweets, the vast majority were about sheltering-in-place, writ broadly, including tweets about adjusting to life at home (eg, work or school from home); tweets about entertainment, including hobbies and recipes; tweets about plans that were canceled (parties, weddings, etc); and a few tweets about a supposed “coronaboomer” phenomenon, where some suggested that the additional time spent at home would lead to an increase in babies born in 2021. In addition, we identified 53 tweets related to the mental health impacts of social distancing, including tweets about tactics to maintain positive mental health, as well as tweets describing the mental health difficulties associated with social distancing.

In other categories, we again saw a wide variety of health topics discussed. This included tweets about monitoring, of which roughly a third were about access to or experiences with COVID-19 testing; hygiene, including handwashing and cleaning protocols; and a few tweets (n=49) weighing in on COVID-19 vaccine development. Last, we also saw instances of tweets about the economic impacts of COVID-19, including on the supply chain and in terms of unemployment.

Table 1. Tweet content and relative proportion.
Category and subcategoriesDefinitionExample tweet (anonymized)Tweets (N=7278), n (%)
Social distancing

All subcategoriesDiscusses social distancing in either a positive or a negative way (eg, not physically seeing friends and family, not going to work, or discussing reasons why lockdowns are unnecessary)“COVID-19 SUCKS! I can’t see my family and I really miss them.”1494 (20.5)

Shelter-in-placeDiscusses any aspects of shelter-in-place or stay-at-home policies; includes school or daycare (or homeschool), remote work, things to do to keep busy while staying home (eg, hobbies and recipes), canceled plans, delivery services (to avoid going out in public), and the supposed phenomenon that birth rates would increase after the pandemic (“coronaboomers”)“State going into lockdown tomorrow. I can work from home but I’m also going to catch up on my backlog Steam library!”1117 (15.3)

Mental healthDiscussions about mental health; includes suggestions of activities to maintain mental health while sheltering-in-place and documents about the mental health difficulties associated with COVID-19 and social distancing“I’m so stressed I’m going to cry. I don’t want to be where I am now, I just want to be alone for quarantine.”53 (0.7)

VotingDecisions around voting by mail (eg, for COVID-19–related safety reasons or the opposite opinion)“Record high cases in the past few days. It’s been two weeks since the election.”12 (0.2)

HoardingStoring things like food, medicines, and disaster supplies“Got a bunch of masks and gloves in case the coronavirus becomes a big deal here.”31 (0.4)

Public eventsDescriptions of going to public places and choosing to not socially distance“Airport security was super fast -- no lines at bag check.”53 (0.7)

All subcategoriesBehavior monitoring for illness; includes monitoring friends or family that have the disease“I keep coming across people with sore throats and cold symptoms today. Hope it’s not COVID!”315 (4.3)

TestingAbility or inability to get tested for COVID-19 infection; includes tweets expressing desires for improvements and increases in testing and novel testing strategies (eg, drive-through testing centers), or in combination with other tactics like contact tracing“The complete failure in testing ramp up is really disappointing.”116 (1.6)

RemediesUnproven treatments, advice, and/or ways to “prevent” or “cure” the disease using natural methods (eg, vitamin D)“Anti Neo Plastons is the natural cure for Coronavirus and your body makes them naturally!”84 (1.2)
Hygiene: all subcategoriesTrying to prevent sickness by using good hygiene, including handwashing, cleaning and sanitation, and other cleanliness-related behaviors“Just saw a kid about to use the water fountain. Their parent grabbed them and said ‘NOOOOOOOO… there could be COVID!’”94 (1.3)
Personal protective equipment: all subcategoriesUsing personal protective equipment to prevent illness; includes masks and gloves“1.) Wear your mask 2.) Social distance 3.) Wash your hands! We can do this!”164 (2.3)

ProvaccineTweets that are positive and supportive of vaccine efforts“The work on the COVID vaccine is amazing. I can’t wait to get it!”31 (0.4)

AntivaccineTweets that use vaccine-averse rhetoric to describe why a vaccine will be unsafe or ill-advised“I hope you’re not in favor of the Gates vaccine. I’m not going to be tracked by a microchip!”18 (0.2)

Supply chainInformation or commentary about supply chain–related issues; includes information about “price gouging”“Can we trust the food supply chain? Should we start growing our own fruits and vegetables?”33 (0.5)

UnemploymentIncludes descriptions of applying for unemployment benefits or commentary on the process; includes stimulus checks or commentary about unemployment or underemployment due to COVID-19“I’m a driver for Uber, but I was put on medical leave after COVID-19 exposure & haven’t made any money since.”53 (0.7)

A breakdown of categories by personal and impersonal labels is shown in Figure 2 (a), and subcategories are shown in Figure 2 (b). Overall, a small fraction of tweets were personal mentions; the majority of tweets were impersonal mentions related to each category (eg, mentions of articles or general opinions and suggestions that do not describe a personal behavior). This is consistent with prior work, which has found that personal mentions of health-related behavior on social media are rare [19].

Figure 2. Category distribution. Tweets are broken down by frequency of personal and impersonal labels (a) and by subcategory grouped by category (b). Categories without subcategories are not shown in (b). Only categories with at least 80 labels, and subcategories with at least 50 labels, are shown. PPE: personal protective equipment.
View this figure

Because there were so few tweets in most categories, it was not feasible to build robust classifiers for most categories or subcategories. For this work, we selected for classification only the personal and impersonal classification task; the categories of social distancing, monitoring, hygiene, and personal protective equipment; and the subcategory shelter-in-place. In general, we found similar performances between random forest and logistic regression (Table 2). The exception to this trend was in the personal protective category, where the logistic regression model substantially outperformed the random forest.

For subsequent analysis, we focused on categories that achieved an F1 of at least 0.6: personal or impersonal, social distancing, and the shelter-in-place classifiers. We then applied the logistic regression models to the remaining data in our corpus of over 228 million tweets through July 2020.

Table 2. Tweet classification results.
ClassifierLogistic regressionRandom forest

PrecisionRecallF1 scorePrecisionRecallF1 score
Personal or impersonal0.760.500.600.720.570.64
Social distancing classifiers

Social distancing (category)0.730.590.660.710.610.66

Shelter-in-place (subcategory)0.690.600.640.650.650.65
Monitoring classifiers0.720.170.280.320.130.18
Hygiene classifiers0.500.290.360.330.210.26
Personal protective equipment (eg, masks and gloves) classifiers0.590.520.550.400.240.30

Temporal Patterns

Using the full classified corpus, we compared temporal patterns in social distancing tweets, shelter-in-place tweets, and the subsets of those groupings which were also classified as personal mentions, to important real-world events that occurred during the outbreak (Figure 3). Importantly, the proportion of tweets classified as social distancing and shelter-in-place tweets followed a predictable pattern with respect to real-world events occurring during the outbreak. Social distancing tweets occurred soon after the initial US COVID-19 case as people started to discuss initial reactions to the new disease. As states began to institute shelter-in-place orders—with California leading in late March 2020 [29]—the number of tweets about social distancing and sheltering-in-place doubled. Tweets in this category stayed high throughout the summer, as a large number of Americans were under shelter-in-place orders [29]. In early April 2020, estimates of the number of Americans told to stay at home were around 95%, despite widespread variation in how stay-at-home orders were implemented [30]. As expected, the number of personal tweets was a small fraction of the social distancing and shelter-in-place tweets more broadly. There was little variation in the temporal patterns of personal tweets; all signals came from the broader set of both personal and impersonal tweets.

Figure 3. Temporal patterns in social distancing and shelter-in-place tweets. The proportion of tweets classified as general social distancing, shelter-in-place, personal shelter-in-place, and personal social distancing are shown by date. Relevant events in the outbreak are shown as vertical lines. As states increased shelter-in-place and lockdown orders, the number of tweets about social distancing and sheltering-in-place dramatically increased. Shading shows the 95% CI calculated using classifier-adjusted bootstrapped sampling while the median is a solid line. CIs are extremely small at several time points. CA: California; SIP: shelter-in-place.
View this figure

State Patterns: Comparisons to Mobility Data

To evaluate temporal patterns more closely, we considered patterns in individual states and compared them to mobility data derived from mobile phone devices (Figures 4-6) and the actual number of confirmed COVID-19 cases (Figure 6). At a high level, it is clear that there is an inverse relationship between the proportion of tweets about social distancing and the actual movement of individuals (Figure 4), indicating that social distancing conversations on Twitter may actually be reflective of real-world behavior. However, we can also see interesting regional patterns among states. For example, some of the earliest-hit states (eg, California, Washington, and New York) showed peaks in the number of tweets about social distancing in late March 2020 compared to states that saw comparatively few cases early on (eg, Florida and Georgia, which had peaks in the number of social distancing tweets in late April 2020).

Most states observed the lowest mobility in April 2020, as seen in Figure 5 (a). The day with the highest fraction of social distancing tweets was most often in March 2020, though many states observed this in April as well, as seen in Figure 5 (b). In general, most states observed these dates within ±20 days of each other, with the majority of states observing the day of minimum mobility before the day with the most tweets about social distancing, as seen in Figure 5 (c). Further, there is a strong negative correlation between the mobility data and the classified Twitter data (Figure 6). Though patterns vary by state, the average correlation is –0.42. Some states show a notably weaker signal (eg, Arkansas, New Mexico, and Rhode Island), which could be caused in part by the relative lack of data in these states. Taken together, these suggest a reasonably strong relationship between our classified Twitter data set and the ground truth mobility data. These patterns are not as clearly reflected in the relationship to confirmed COVID-19 cases. The average correlation between the proportion of tweets about social distancing and the number of confirmed COVID-19 cases is –0.08, though the strongest, which comes from Alabama, is –0.53. This suggests that, while social distancing discussions on social media are reflective of actual social distancing practices as measured by mobility data, the link to COVID-19 transmission is likely more complicated.

Figure 4. US state patterns in mobility compared to social distancing tweets from January to July 2020. Descartes Lab data showing a rolling 7-day average of percent change in mobility (divided by 5, to improve visualization) is plotted alongside the proportion of social distancing tweets per week. Both temporal and regional patterns are clear. Further, as the proportion of social distancing tweets increased, mobility measured by Descartes Labs decreased. States without sufficient Twitter data were removed from the grid. 2-letter abbreviations are used for each state.
View this figure
Figure 5. Comparison of peak social distancing tweet proportions and minimum mobility. To validate our social media findings, we compared them to mobility data provided by Descartes Labs. Dates of minimum mobility are aggregated by month (a), while dates of highest proportion of tweets about social distancing, aggregated by month, are shown in (b). The difference, in days, between the date of minimum mobility and the date of highest proportion of social distancing tweets (c) show that most states observed both peaks within 20 days of one another.
View this figure
Figure 6. Correlation between confirmed COVID-19 cases or mobility and proportion of tweets about social distancing by US state. Most states have a moderate negative correlation between the proportion of tweets about social distancing and mobility data (yellow), indicating good agreement in the two signals. Some states have notably weaker negative correlations (eg, AR, NM, and RI), which could be the result of less Twitter data. Correlations between the number of confirmed COVID-19 cases and the proportion of tweets about social distancing are weak (blue), with a few notable exceptions (eg, AL). 2-letter abbreviations are used for each state.
View this figure

Principal Findings

The ongoing COVID-19 outbreak clearly illustrates the need for real-time information gathering to assess evolving beliefs and behaviors that directly impact disease spread. Historically, such information would be gathered using survey methods [5,7,31], which are time-consuming, expensive, and typically lack the ability to measure temporal and spatial variation [32]. One proposed partial solution is to use internet data (eg, search query patterns and social media data), which have been shown to correspond to disease incidence in emergent infectious disease outbreaks [23,33-35], individual risk perception [1,36,37], and risk communication [38], and have been used to identify specific health behaviors [15]. During the early stages of the current COVID-19 pandemic, social media data have been used to monitor the top concerns of individuals [39,40], characterize COVID-19 awareness [41], compare social connectedness and COVID-19 hot spots [42], monitor misinformation [40,43-45], and rapidly disseminate information [46]. Last, social media has been used as an information gathering platform during periods of uncertain information. Disease emergence is a context wherein disease risks, transmission, and treatment may be largely unclear [46]. With this context in mind, we address our findings with respect to each research question below.

What behaviors related to COVID-19 are discussed on social media websites, like Twitter? We find that there are a wide variety of behaviors discussed on social media, including mask-wearing, hygiene (eg, handwashing), testing availability and experiences, and social distancing practices. Prior work has found evidence that mask-wearing and limited mobility were behaviors adopted to reduce disease spread during SARS [5] and that handwashing would be commonly implemented by individuals during a hypothetical pandemic influenza [47]. This prior work, however, has relied on surveys to obtain data about the behaviors that individuals implement. The use of social media to complement such work would improve both the richness and the temporal and geographic scope of the data available.

Some of the identified tweets show evidence of sensitive topics. For example, we found 53 tweets related to individuals’ mental health. Prior research has found that social media can be used to identify individuals with a variety of mental health concerns, including depression [48] and suicide [14]. As there is considerable work emerging about the substantial mental health impacts of COVID-19 (eg, increases in domestic violence [49] as well as depression and anxiety [50]), this could prove to be an important avenue for future work in this field.

Last, we found a small number of tweets (n=49) about vaccination related to COVID-19, of which roughly a third (n=18) showed a negative attitude. Importantly, this study was conducted prior to the authorization of any vaccines in the United States. All of the tweets considered here discuss either vaccine development or a hypothetical COVID-19 vaccine. Prior research has found similarly negative tweets during the emergence of Zika [51] and the H1N1 influenza pandemic [52]. Future work analyzing these data could provide additional insight into specific reasons that populations may be hesitant to receive the COVID-19 vaccine and could inform targeted public health messaging.

How do patterns in behaviors change geospatially and temporally in the United States? As expected, the patterns in tweets classified as social distancing and shelter-in-place followed extremely similar trends. These patterns corresponded to important real-world events during the outbreak, suggesting that individuals were responding to actual events and some were describing their own personal behavior. We found, however, that tweets classified as personal mentions represented a very small subset of social distancing and shelter-in-place tweets. This is not unexpected, given that prior work has shown that personal mentions of health may be extremely uncommon [20].

How do these trends compare to other data streams, like mobility data sets, that have also shown promise in COVID-19 modeling efforts? Despite the lack of a temporal signal in tweets labeled as personal and social distancing, there was a stronger signal when comparing classified data to Descartes Labs’ mobility data. We observed meaningful regional differences between states and saw that, in general, the peak number of tweets about social distancing happened within a few weeks of the actual measured minimum in mobility. This suggests that social media data may be used as a proxy for sensor data in appropriately data-rich contexts. Recent work using geotagged Twitter data to create social networks and analyze social distancing in the context of policy decisions found similar relationships and supports this finding [53].


There are a number of limitations to consider in this work. The first is that, as mentioned above, it is known that social media data are biased in a number of ways, including demographically, and that bias differs by geographic areas [18]. Further, personal mentions of health-related information on Twitter are rare [19]. These are known limitations of using internet data and could potentially explain the variations in correlation we observed between social distancing posts and actual mobility data. Importantly, however, it is difficult to assess this without extensive prospective surveys conducted at the same time as tweet collection.

Our observed wide range in correlations between the proportion of social distancing tweets and actual COVID-19 cases in individual states is an example of the ecological fallacy. State-level COVID-19 cases represent an aggregate measure of a state’s behavior, while tweets represent individual actions and observations. The available data do not allow us to probe the reasons for the variation, but a number of possible factors could be at play. Individuals’ social distancing thoughts at a specific moment in time will be influenced by contextual information about other aspects of their lives. For example, people that tweet in support of social distancing may have in-person jobs or be in high-risk groups, which could motivate them to use social media platforms to voice support for public health measures. The stronger correlation with mobility outcomes is expected by this same argument because mobility is more directly representative of individual actions.

Additionally, tweeting norms could be systematically different across the country (eg, people in different states might be more or less likely to talk about social distancing based on the policies in place and the perceived threat of COVID-19). It is also possible that there are differences in which individuals use Twitter and have geolocation services enabled in different states. In an operational context, it is hugely important to combine internet data with traditional data streams in order to provide a more complete picture of an evolving scenario. Future work should focus on targeted studies to better understand potential bias.

An additional known source of bias comes from imperfect classification. Our classifiers performed similarly to other classifiers used to identify health behaviors [15], but were clearly not perfect. To account for known classifier bias, we used an adjusted bootstrapping method from Daughton and Paul [25], which generates accurate confidence intervals despite classifier error.

We validated our work using mobility data from Descartes Labs. However, there are a number of mobility data sources available [54]. Prior work indicates that these data have similar patterns [54], but it is possible that using a different source would produce slightly different validation results.


Behavior changes and policy decisions that occur early within an outbreak have the largest effects on disease dynamics [55,56]. Real-time conversations about health behaviors, in addition to other behavioral data sources such as mobility metrics or media consumption (eg, home television viewing [55]), could help improve overall knowledge and policy decisions in the early stages of an epidemic and could better capture dynamic changes caused by uncoordinated behavioral change. Using such data has the unique capability to inform public health decisions as an outbreak emerges, especially with respect to public health communication. The World Health Organization suggests a communication checklist to prepare for and minimize morbidity and mortality in the event of a pandemic [57,58]. The checklist emphasizes building public trust through early communication, even with incomplete information, and evaluating the impact of communication programs to assess whether recommendations are being followed. The use of social media streams as a simultaneous real-time measure of public sentiment toward messaging and a dynamic evaluation tool of communication effectiveness could be invaluable in minimizing effects from a future disease outbreak.


ARD, CDS, and DG created the labeling schema. DG, NP, TP, ARD, CWR, GF, and NYVC collected and analyzed the Twitter data. ARD, CDS, DG, IC, GN, and NM labeled the tweets. ARD and CWR built the supervised learning models, and ARD implemented the classifier-adjusted bootstrapped sampling. MB collected the mobility data and created several figures. ARD, CDS, and MB wrote the initial paper. All authors provided critical revisions to the paper. ARD led the project. Research support was provided by the Laboratory Directed Research and Development program of Los Alamos National Laboratory (project No. 20200721ER) and the US Department of Energy through the Los Alamos National Laboratory. Los Alamos National Laboratory is operated by Triad National Security, LLC, for the National Nuclear Security Administration of the US Department of Energy (Contract No. 89233218CNA000001). The Los Alamos National Laboratory Review & Approval System reporting number is LA-UR-21-20074.

Conflicts of Interest

None declared.


  1. Taylor S. The Psychology of Pandemics: Preparing for the Next Global Outbreak of Infectious Disease. Newcastle upon Tyne, UK: Cambridge Scholars Publishing; 2019.
  2. Bults M, Beaujean DJ, Richardus JH, Voeten HA. Perceptions and behavioral responses of the general public during the 2009 influenza A (H1N1) pandemic: A systematic review. Disaster Med Public Health Prep 2015 Apr;9(2):207-219. [CrossRef] [Medline]
  3. Douglas PK, Douglas DB, Harrigan DC, Douglas KM. Preparing for pandemic influenza and its aftermath: Mental health issues considered. Int J Emerg Ment Health 2009;11(3):137-144. [Medline]
  4. Shultz JM, Espinel Z, Flynn BW, Hoffman Y, Cohen RE. Deep Prep: All-Hazards Disaster Behavioral Health Training. Miami, FL: Miller School of Medicine, University of Miami; 2008.
  5. Lau JTF, Yang X, Pang E, Tsui HY, Wong E, Wing YK. SARS-related perceptions in Hong Kong. Emerg Infect Dis 2005 Mar;11(3):417-424 [FREE Full text] [CrossRef] [Medline]
  6. MacDonald PDM, Holden EW. Zika and public health: Understanding the epidemiology and information environment. Pediatrics 2018 Feb 01;141(Supplement 2):S137-S145. [CrossRef]
  7. Darrow W, Bhatt C, Rene C, Thomas L. Zika virus awareness and prevention practices among university students in Miami: Fall 2016. Health Educ Behav 2018 Dec;45(6):967-976. [CrossRef] [Medline]
  8. Mendoza C, Jaramillo G, Ant TH, Power GM, Jones RT, Quintero J, et al. An investigation into the knowledge, perceptions and role of personal protective technologies in Zika prevention in Colombia. PLoS Negl Trop Dis 2020 Jan;14(1):e0007970 [FREE Full text] [CrossRef] [Medline]
  9. White RW, Horvitz E. From health search to healthcare: Explorations of intention and utilization via query logs and user surveys. J Am Med Inform Assoc 2014;21(1):49-55 [FREE Full text] [CrossRef] [Medline]
  10. Coogan S, Sui Z, Raubenheimer D. Gluttony and guilt: Monthly trends in internet search query data are comparable with national-level energy intake and dieting behavior. Palgrave Commun 2018 Jan 9;4(1):1-9. [CrossRef]
  11. Ayers JW, Ribisl KM, Brownstein JS. Tracking the rise in popularity of electronic nicotine delivery systems (electronic cigarettes) using search query surveillance. Am J Prev Med 2011 Apr;40(4):448-453. [CrossRef] [Medline]
  12. Eichstaedt JC, Schwartz HA, Kern ML, Park G, Labarthe DR, Merchant RM, et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychol Sci 2015 Feb;26(2):159-169 [FREE Full text] [CrossRef] [Medline]
  13. Paul MJ, Dredze M. You are what you tweet: Analyzing Twitter for public health. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. 2011 Presented at: Fifth International AAAI Conference on Weblogs and Social Media; July 17-21, 2011; Barcelona, Spain   URL:
  14. McClellan C, Ali MM, Mutter R, Kroutil L, Landwehr J. Using social media to monitor mental health discussions - Evidence from Twitter. J Am Med Inform Assoc 2017 May 01;24(3):496-502 [FREE Full text] [CrossRef] [Medline]
  15. Daughton AR, Paul MJ. Identifying protective health behaviors on Twitter: Observational study of travel advisories and Zika virus. J Med Internet Res 2019 May 13;21(5):e13090 [FREE Full text] [CrossRef] [Medline]
  16. Ramanadhan S, Mendez SR, Rao M, Viswanath K. Social media use by community-based organizations conducting health promotion: A content analysis. BMC Public Health 2013 Dec 05;13:1129 [FREE Full text] [CrossRef] [Medline]
  17. Carrotte ER, Prichard I, Lim MSC. "Fitspiration" on social media: A content analysis of gendered images. J Med Internet Res 2017 Mar 29;19(3):e95 [FREE Full text] [CrossRef] [Medline]
  18. Mislove A, Lehmann S, Ahn Y, Onnela J, Rosenquist J. Understanding the demographics of Twitter users. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. 2011 Presented at: Fifth International AAAI Conference on Weblogs and Social Media; July 17-21, 2011; Barcelona, Spain   URL:
  19. Daughton AR, Chunara R, Paul MJ. Comparison of social media, syndromic surveillance, and microbiologic acute respiratory infection data: Observational study. JMIR Public Health Surveill 2020 Apr 24;6(2):e14986 [FREE Full text] [CrossRef] [Medline]
  20. Engle S, Stromme J, Zhou A. Staying at home: Mobility effects of COVID-19. SSRN J 2020:1-16 (forthcoming). [CrossRef]
  21. Buckee CO, Balsari S, Chan J, Crosas M, Dominici F, Gasser U, et al. Aggregated mobility data could help fight COVID-19. Science 2020 Apr 10;368(6487):145-146. [CrossRef] [Medline]
  22. Chen E, Lerman K, Ferrara E. Tracking social media discourse about the COVID-19 pandemic: Development of a public coronavirus Twitter data set. JMIR Public Health Surveill 2020 May 29;6(2):e19273 [FREE Full text] [CrossRef] [Medline]
  23. Lamb A, Paul MJ, Dredze M. Separating fact from fear: Tracking flu infections on Twitter. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013 Presented at: 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; June 9-14, 2013; Atlanta, GA p. 789-795.
  24. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 2011 Nov;12:2825-2830 [FREE Full text] [CrossRef]
  25. Daughton AR, Paul MJ. Constructing accurate confidence intervals when aggregating social media data for public health monitoring. In: Proceedings of the International Workshop on Health Intelligence (W3PHAI 2019). 2019 Presented at: International Workshop on Health Intelligence (W3PHAI 2019); January 27-February 1, 2019; Honolulu, HI.
  26. Data for mobility changes in response to COVID-19. GitHub. Santa Fe, NM: Descartes Labs   URL: [accessed 2021-05-06]
  27. Warren MS, Skillman SW. Mobility changes in response to COVID-19. ArXiv. Preprint posted online on March 31, 2020. [FREE Full text]
  28. Coronavirus (Covid-19) data in the United States. GitHub. New York, NY: The New York Times   URL: [accessed 2021-05-06]
  29. State “shelter-in-place” and “stay-at-home” orders. FINRA.   URL: [accessed 2020-12-23]
  30. Mervosh S, Lu D, Swales V. See which states and cities have told residents to stay at home. The New York Times. 2020 Apr 20.   URL: [accessed 2021-01-05]
  31. Chandrasekaran N, Marotta M, Taldone S, Curry C. Perceptions of community risk and travel during pregnancy in an area of Zika transmission. Cureus 2017 Jul 26;9(7):e1516 [FREE Full text] [CrossRef] [Medline]
  32. Blaikie N. Designing Social Research: The Logic of Anticipation. 2nd edition. Cambridge, UK: Polity Press; 2009.
  33. Chan EH, Sahai V, Conrad C, Brownstein JS. Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance. PLoS Negl Trop Dis 2011 May;5(5):e1206. [CrossRef] [Medline]
  34. Culotta A. Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the First Workshop on Social Media Analytics (SOMA '10). 2010 Presented at: First Workshop on Social Media Analytics (SOMA '10); July 25, 2010; Washington, DC p. 155-122. [CrossRef]
  35. Watad A, Watad S, Mahroum N, Sharif K, Amital H, Bragazzi NL, et al. Forecasting the West Nile virus in the United States: An extensive novel data streams-based time series analysis and structural equation modeling of related digital searching behavior. JMIR Public Health Surveill 2019 Feb 28;5(1):e9176 [FREE Full text] [CrossRef] [Medline]
  36. Hassan MS, Halbusi HA, Najem A, Razali A, Williams KA, Mustamil NM. Impact of risk perception on trust in government and self-efficiency during COVID-19 pandemic: Does social media content help users adopt preventative measures? Research Square. Preprint posted online on July 16, 2020. [CrossRef]
  37. Oh SH, Lee SY, Han C. The effects of social media use on preventive behaviors during infectious disease outbreaks: The mediating role of self-relevant emotions and public risk perception. Health Commun 2020 Feb 16:1-10. [CrossRef] [Medline]
  38. Ding H, Zhang J. Social media and participatory risk communication during the H1N1 flu epidemic: A comparative study of the United States and China. China Media Res 2010;6(4):80-91 [FREE Full text]
  39. Abd-Alrazaq A, Alhuwail D, Househ M, Hamdi M, Shah Z. Top concerns of tweeters during the COVID-19 pandemic: Infoveillance study. J Med Internet Res 2020 Apr 21;22(4):e19016 [FREE Full text] [CrossRef] [Medline]
  40. Singh L, Bansal S, Bode L, Budakb C, Chic G, Kawintiranona K, et al. A first look at COVID-19 information and misinformation sharing on Twitter. ArXiv. Preprint posted online on March 31, 2020. [FREE Full text]
  41. Saad M, Hassan M, Zaffar F. Towards characterizing COVID-19 awareness on Twitter. ArXiv. Preprint posted online on May 21, 2020. [FREE Full text]
  42. Bailey M, Cao R, Kuchler T, Stroebel J, Wong A. Social connectedness: Measurement, determinants, and effects. J Econ Perspect 2018 Aug 01;32(3):259-280. [CrossRef]
  43. Ahmed W, Vidal-Alaball J, Downing J, López Seguí F. COVID-19 and the 5G conspiracy theory: Social network analysis of Twitter data. J Med Internet Res 2020 May 06;22(5):e19458 [FREE Full text] [CrossRef] [Medline]
  44. Broniatowski DA, Paul MJ, Dredze M. National and local influenza surveillance through Twitter: An analysis of the 2012-2013 influenza epidemic. PLoS One 2013;8(12):e83672 [FREE Full text] [CrossRef] [Medline]
  45. Gerts D, Shelley CD, Parikh N, Pitts T, Watson Ross C, Fairchild G, et al. "Thought I'd share first" and other conspiracy theory tweets from the COVID-19 infodemic: Exploratory study. JMIR Public Health Surveill 2021 Apr 14;7(4):e26527 [FREE Full text] [CrossRef] [Medline]
  46. Chan AKM, Nickson CP, Rudolph JW, Lee A, Joynt GM. Social media for rapid knowledge dissemination: Early experience from the COVID-19 pandemic. Anaesthesia 2020 Dec;75(12):1579-1582 [FREE Full text] [CrossRef] [Medline]
  47. Sadique MZ, Edmunds WJ, Smith RD, Meerding WJ, de Zwart O, Brug J, et al. Precautionary behavior in response to perceived threat of pandemic influenza. Emerg Infect Dis 2007 Sep;13(9):1307-1313 [FREE Full text] [CrossRef] [Medline]
  48. De Choudhury CM, Gamon M, Counts S, Horvitz S. Predicting depression via social media. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media. 2013 Presented at: Seventh International AAAI Conference on Weblogs and Social Media; July 8-11, 2013; Boston, MA   URL:
  49. Kofman YB, Garfin DR. Home is not always a haven: The domestic violence crisis amid the COVID-19 pandemic. Psychol Trauma 2020 Aug;12(S1):S199-S201 [FREE Full text] [CrossRef] [Medline]
  50. Vindegaard N, Benros ME. COVID-19 pandemic and mental health consequences: Systematic review of the current evidence. Brain Behav Immun 2020 Oct;89:531-542 [FREE Full text] [CrossRef] [Medline]
  51. Ghenai A, Mejova Y. Catching Zika fever: Application of crowdsourcing and machine learning for tracking health misinformation on Twitter. In: Proceedings of the 2017 IEEE International Conference on Healthcare Informatics (ICHI). 2017 Presented at: 2017 IEEE International Conference on Healthcare Informatics (ICHI); August 23-26, 2017; Park City, UT p. 518. [CrossRef]
  52. Salathé M, Khandelwal S. Assessing vaccination sentiments with online social media: Implications for infectious disease dynamics and control. PLoS Comput Biol 2011 Oct;7(10):e1002199 [FREE Full text] [CrossRef] [Medline]
  53. Porcher S, Renault T. Social distancing beliefs and human mobility: Evidence from Twitter. ArXiv. Preprint posted online on August 10, 2020. [FREE Full text] [CrossRef]
  54. Huang X, Li Z, Jiang Y, Ye X, Deng C, Zhang J, et al. The characteristics of multi-source mobility datasets and how they reveal the luxury nature of social distancing in the US during the COVID-19 pandemic. Int J Digit Earth 2021 Feb 17;14(4):424-442. [CrossRef]
  55. Schwarzinger M, Flicoteaux R, Cortarenoda S, Obadia Y, Moatti J. Low acceptability of A/H1N1 pandemic vaccination in French adult population: Did public health policy fuel public dissonance? PLoS One 2010 Apr 16;5(4):e10199 [FREE Full text] [CrossRef] [Medline]
  56. Springborn M, Chowell G, MacLachlan M, Fenichel EP. Accounting for behavioral responses during a flu epidemic using home television viewing. BMC Infect Dis 2015 Jan 23;15:21 [FREE Full text] [CrossRef] [Medline]
  57. World Health Organization, Department of Communicable Disease Surveillance and Response. WHO Guidelines for Epidemic Preparedness and Response to Measles Outbreaks. Geneva, Switzerland: World Health Organization; 1999 May.   URL: [accessed 2016-07-27]
  58. World Health Organization, Department of Communicable Disease Surveillance and Response, Global Influenza Programme. WHO Checklist for Influenza Pandemic Preparedness Planning. Geneva, Switzerland: World Health Organization; 2005.   URL: [accessed 2020-05-06]

API: application programming interface

Edited by C Basch; submitted 11.01.21; peer-reviewed by X Zhou, L Guo, Z Jin; comments to author 01.03.21; revised version received 08.03.21; accepted 17.04.21; published 25.05.21


©Ashlynn R Daughton, Courtney D Shelley, Martha Barnard, Dax Gerts, Chrysm Watson Ross, Isabel Crooker, Gopal Nadiga, Nilesh Mukundan, Nidia Yadira Vaquera Chavez, Nidhi Parikh, Travis Pitts, Geoffrey Fairchild. Originally published in the Journal of Medical Internet Research (, 25.05.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.