This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Social media is a rich source where we can learn about people’s reactions to social issues. As COVID-19 has impacted people’s lives, it is essential to capture how people react to public health interventions and understand their concerns.
We aim to investigate people’s reactions and concerns about COVID-19 in North America, especially in Canada.
We analyzed COVID-19–related tweets using topic modeling and aspect-based sentiment analysis (ABSA), and interpreted the results with public health experts. To generate insights on the effectiveness of specific public health interventions for COVID-19, we compared timelines of topics discussed with the timing of implementation of interventions, synergistically including information on people’s sentiment about COVID-19–related aspects in our analysis. In addition, to further investigate anti-Asian racism, we compared timelines of sentiments for Asians and Canadians.
Topic modeling identified 20 topics, and public health experts provided interpretations of the topics based on top-ranked words and representative tweets for each topic. The interpretation and timeline analysis showed that the discovered topics and their trend are highly related to public health promotions and interventions such as physical distancing, border restrictions, handwashing, staying home, and face coverings. After training the data using ABSA with human-in-the-loop, we obtained 545 aspect terms (eg, “vaccines,” “economy,” and “masks”) and 60 opinion terms such as “infectious” (negative) and “professional” (positive), which were used for inference of sentiments of 20 key aspects selected by public health experts. The results showed negative sentiments related to the overall outbreak, misinformation and Asians, and positive sentiments related to physical distancing.
Analyses using natural language processing techniques with domain expert involvement can produce useful information for public health. This study is the first to analyze COVID-19–related tweets in Canada in comparison with tweets in the United States by using topic modeling and human-in-the-loop domain-specific ABSA. This kind of information could help public health agencies to understand public concerns as well as what public health messages are resonating in our populations who use Twitter, which can be helpful for public health agencies when designing a policy for new interventions.
Worldwide, more than 31 million people have been diagnosed with COVID-19, and more than 1 million people have died as of October 12, 2020 [
During this pandemic, people have been using social media such as Twitter to share news, information, opinions, and emotions about COVID-19 [
Topic modeling and sentiment analysis have been widely used to identify issues and people’s opinions in public health and is being used to understand COVID-19–related issues as well (
Related work on topic modeling and sentiment analysis on COVID-19–related data.
Authors | Source | Posters | Time | Location | Language | Sentiment |
Liu et al [ |
News articles | News reporters | January 1 to February 20, 2020 | Not specified | Chinese | No |
Dong et al [ |
Research papers | Researchers | Unknown to March 20, 2020 | Not specified | English | No |
Stokes et al [ |
Reddit posts | Public | March 3-31, 2020 | Not specified | English | No |
Sha et al [ |
Tweets | State governors, presidential cabinet members, and the president | January 1 to April 7, 2020 | US | English | No |
Hosseini et al [ |
Tweets | Public | March 13 to April 19, 2020 | Iran | Persian and Farsi | No |
Sharma et al [ |
Tweets | Public | March 1-30, 2020 | Not specified | English | Yes |
Odlum et al [ |
Tweets | Public (African Americans) | January 21 to May 3, 2020 | Not specified | English | Yes |
Wang et al [ |
Tweets | Public | March 5 to April 2, 2020 | California and New York, US | English | Yes |
Abd-Alrazaq et al [ |
Tweets | Public | February 2 to March 15, 2020 | Not specified | English | Yes |
Ordun et al [ |
Tweets | Public | March 24 to April 9, 2020 | Not specified | English, Spanish, Italian, French, and Portuguese | No |
This study | Tweets | Public | January 21 to May 31, 2020 | Canada and US | English | Yes |
Our study aims to investigate Twitter users’ reactions to COVID-19 in North America, especially in Canada. We analyzed COVID-19–related tweets with topic modeling and aspect-based sentiment analysis (ABSA) using human-in-the-loop and interpret the results with public health experts. We examined the sentiment of tweets about COVID-19–related aspects such as social distancing and masks by using ABSA based on domain-specific aspect and opinion terms. The key advantage of our study is that public health experts are actively involved in the computational process with the specific goal of informing public health interventions. Our results were interpreted by these public health experts, and we used a human-in-the-loop ABSA approach to obtain domain specific aspect and opinion terms. To the best of our knowledge, we are the first to directly identify sentiment of COVID-19–specific aspects.
We used a public Twitter data set about the COVID-19 pandemic, collected by Chen et al [
For our study, we collected tweets until the end of May 2020, the end of the first wave in Canada, since we aim to investigate people’s reactions and concerns in the early days of COVID-19. We selected tweets whose location is Canada or the United States.
Among the 372,711 tweets in total (Canada: n=30,235, US: n=342,476), we only included tweets written in English using tweet metadata and the spacy-langdetect toolkit [
We first discovered topics in COVID-19–related tweets using a widely used topic modeling approach, latent Dirichlet allocation (LDA) [
To discover topics and track the topic change over time, we constructed topic models on our Twitter data using LDA implementation in the scikit-learn package [
The topics generated by LDA were interpreted and labeled by two public health experts. Both experts have extensive experiences in public health with doctoral training in the field. In the initial phase of the study before choosing a final model, they discussed the results to build consensus. After the final output was obtained, the junior expert interpreted and labeled it first, and the senior expert reviewed.
To analyze the dynamics of public health relevant topics, we investigated the change in the prevalence of the topics over time. More specifically, we performed a basic analysis based on an examination of the estimates of
To capture sentiment revealed in tweets toward important aspects of COVID-19, we used ABSA. In our study, aspects can include public health interventions or issues associated with COVID-19, such as “social-distancing,” “reopening,” and “masks.” We investigated people’s opinion (positive and negative) toward these aspects.
We used ABSApp, a weakly-supervised ABSA system [
The two public health experts who labeled topics from topic modeling also edited the terms so that aspect terms are related to important public health interventions or issues they are interested in and that opinion terms are words that describe sentiment of those public health terms. Similarly to the topic interpretation process, the junior expert edited the terms first, and the senior expert reviewed.
Mobility and case count for Canada from February 15 to May 31, 2020. Google mobility data is only available since February 15. pharm.: pharmacy.
Mobility and case counts for the United States from February 15 to May 31, 2020. Google mobility data is only available since February 15. pharm.: pharmacy.
The discovered topics were highly related to public health promotions and interventions such as physical distancing, border restrictions, handwashing, staying home, and face coverings, as shown in
Age and COVID-19 transmission, as well as time
Initial outbreak in Wuhan
US President Trump’s statement
Thank you notes related to the pandemic mixed with discussion of cruise ship outbreaks
Air travel and regional border restrictions and outbreaks
Age and COVID-19 transmission, as well as time
US President Trump’s statement
Early debate on whether COVID-19 is like the flu.
Initial outbreak in Wuhan
The need to stay home and the impact of COVID-19 on essential workers and family
The most prevalent topics in Canada and the United States showed some differences, as can be seen in
Based on the mean
Changes of six public health–relevant topics over time. T1: social and physical distancing; T2: air travel and regional border restrictions and outbreaks; T3: handwashing and preventive measures; T4: the need to stay home and impact of COVID-19 on essential workers and family; T5: number of tests and cases; T6: masks and face coverings.
Second, we could see that the topic trend is highly related to public health interventions. For example, the topic about social distancing (T1) started to increase in early March 2020 after social distancing measures were enacted. Handwashing (T3) also started to be emphasized then. The topic about the need to stay home (T4) started to increase around the end of March. In Canada, the Federal Quarantine Order was issued on March 24, and in the United States, many states issued stay-at-home orders around that time as well. Discussion about the number of tests and cases (T5) gradually increased. Interestingly, the topic about masks and face coverings (T6) slightly decreased from March; this is possibly because public health institutes in both countries announced their position about masks around that time.
After training the tweet data using ABSApp, we obtained 806 aspect terms and 211 opinion terms. Manually editing the lexicons resulted in 545 aspect terms (eg, “vaccines,” “economy,” and “masks”) and 60 domain-specific opinion terms such as “infectious” (negative) and “professional” (positive). These manually edited terms were then used for the inference of sentiments for 20 key aspects selected by public health experts. The results are shown in
Aspect-based sentiment analysis results. x-axis: selected aspects; y-axis: number of positive occurrences and number of negative occurrences in log scale.
Aspect-based sentiment analysis results for selected aspects. y-axis: the ratio between number of positive occurrences and number of negative occurrences.
To further investigate the possible stigma for Asians, we observed words that frequently co-occurred with the aspect words Chinese and Asians. The top-ranked words in negative tweets included “virus,” “racist,” “racism,” “fucking,” “attacks,” “ass,” “assaults,” “blame,” and “hate,” and the top-ranked words in positive tweets included “fucking,” “racism,” “respectful,” “kind,” “street,” “disgusting,” and “crying.” We list sample tweets that show positive and negative sentiments in
“You should not be afraid of Asians but you should be absolutely terrified of the PEOPLE THAT DONT COVER THEIR MOUTHS/NOSES DURING A COUGH AND/OR SNEEZE.”
“French Asians hit back at racism with 'I'm not a virus”
“Y’all realize that the coronavirus ain’t exclusive to Chinese people right?? mfs look for any excuse to be racist bruh”
“Oriental Asians always starting some fuckin outbreak...”
“Yea I’m holding my breath round all Asians till this coronavirus shit clear up call it wat u think it is.”
“No Asians allowed in my shop after the outbreak.”
Sentiment changes over time for Asians and Canadians. Y-axis shows the number of positive occurrences, number of negative occurrences, and number of total occurrences. neg: negative; pos: positive.
In this study, using topic modeling and ABSA on Twitter data from North America, we identified various topics related to physical distancing, travel and boarder restrictions, handwashing and preventive measures, face masks, stay-at-home orders, and the number of cases and testing. Travel and border restrictions were major discussion points in February 2020, which were taken over by other topics such as physical distancing later in time. ABSA analysis identified various negative themes related to the overall outbreak, anti-Asian racism and misinformation, and positive occurrences related to physical distancing. These data demonstrate Twitter users’ focus on discussing and reacting to public health interventions during the first phase of the pandemic.
This kind of information could help public health agencies to understand public concerns as well as what public health messages are resonating in our populations who use Twitter. For example, public health agencies in North America have focused their messaging around encouraging hand hygiene, limiting physical contact when sick, and staying home to prevent infection. We can see this messaging echoing in the topics around handwashing, staying home, mask wearing, and social or physical distancing.
For public health decision makers, it would be beneficial to have the pipeline where a computational model keeps running on social media data as a stream, and the results are reviewed by public health experts. This will then be reflected in public health education communication or messages to address misinformation related to the topics.
Risk communication and knowledge translation in practice is a combination of proactive and reactive messaging [
Our findings that tweets reflect public health interventions are aligned with other studies. Abd-Alrazaq et al [
Depending on tweets used for analysis, other studies report some interesting topics different from topics drawn from tweets in Canada. For example, topics related to government and political issues were observed in the studies on tweets by US governors [
Our ABSA provides sentiments toward specific aspects by considering sentence structures, while most prior works performing sentiment analysis use algorithms to decide a sentiment of an entire text. For this reason, these studies are generally not suitable for identifying a sentiment of a given aspect. For instance, Wang et al [
However, our ABSA results, especially related to racism and discrimination against Asians, were also observed in other research using different study methods. Zhu [
Our study had the following limitations. We used only a small set of Twitter data because tweets with the location information were limited compared to the whole data set. This has affected other studies using social media data in a similar fashion. Moreover, it should be noted that the geo-tagged tweets data set comprises statements from a nonuniform subsample of the population. According to Gore et al [
In our data set, we looked at location at the country level (ie, Canada or the United States). However, Gore et al [
Another possible bias comes from not knowing who tweeted from the locations. Padilla et al [
In general, whenever our proposed pipeline would be deployed in practice, all these biases should be carefully considered and addressed.
In addition, although ABSA allows capturing more nuanced sentiments toward specific aspects, it also has the limitation that current state-of-the-art sentiment analysis techniques have: it cannot properly handle figurative languages such as sarcasm. However, since our proposed approach can process substantial amounts of twitter data, it should be able to deal with the noise generated by these complex pragmatic phenomena.
In this paper, we present the exploratory results of topic modeling and ABSA on COVID-19–related tweets in North America, especially in Canada. We compared topic modeling and ABSA results of Canada and the United States, and showed public health intervention–related topic changes over time. Our analyses demonstrated that Twitter conversations about COVID-19 are highly aligned with public health interventions. In our study, public health experts were actively involved in the computational process as well as interpretation of the results. The human-in-the-loop ABSA allowed manually editing aspect and opinion lexicons, and as a result, our analysis showed sentients toward the aspects public health experts were interested in by leveraging the domain-specific lexicons. Our results suggest that monitoring Twitter user’s reactions about COVID-19–related aspects can be beneficial for public health policy makers.
Latent Dirichlet allocation–generated topics and their interpretations.
aspect-based sentiment analysis
latent Dirichlet allocation
None declared.