This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
COVID-19 vaccines are one of the most effective preventive strategies for containing the pandemic. Having a better understanding of the public’s conceptions of COVID-19 vaccines may aid in the effort to promptly and thoroughly vaccinate the community. However, because no empirical research has yet fully explored the public’s vaccine awareness through sentiment–based topic modeling, little is known about the evolution of public attitude since the rollout of COVID-19 vaccines.
In this study, we specifically focused on tweets about COVID-19 vaccines (Pfizer, Moderna, AstraZeneca, and Johnson & Johnson) after vaccines became publicly available. We aimed to explore the overall sentiments and topics of tweets about COVID-19 vaccines, as well as how such sentiments and main concerns evolved.
We collected 1,122,139 tweets related to COVID-19 vaccines from December 14, 2020, to April 30, 2021, using Twitter’s application programming interface. We removed retweets and duplicate tweets to avoid data redundancy, which resulted in 857,128 tweets. We then applied sentiment–based topic modeling by using the compound score to determine sentiment polarity and the coherence score to determine the optimal topic number for different sentiment polarity categories. Finally, we calculated the topic distribution to illustrate the topic evolution of main concerns.
Overall, 398,661 (46.51%) were positive, 204,084 (23.81%) were negative, 245,976 (28.70%) were neutral, 6899 (0.80%) were highly positive, and 1508 (0.18%) were highly negative sentiments. The main topics of positive and highly positive tweets were planning for getting vaccination (251,979/405,560, 62.13%), getting vaccination (76,029/405,560, 18.75%), and vaccine information and knowledge (21,127/405,560, 5.21%). The main concerns in negative and highly negative tweets were vaccine hesitancy (115,206/205,592, 56.04%), extreme side effects of the vaccines (19,690/205,592, 9.58%), and vaccine supply and rollout (17,154/205,592, 8.34%). During the study period, negative sentiment trends were stable, while positive sentiments could be easily influenced. Topic heatmap visualization demonstrated how main concerns changed during the current widespread vaccination campaign.
To the best of our knowledge, this is the first study to evaluate public COVID-19 vaccine awareness and awareness trends on social media with automated sentiment–based topic modeling after vaccine rollout. Our results can help policymakers and research communities track public attitudes toward COVID-19 vaccines and help them make decisions to promote the vaccination campaign.
COVID-19 vaccines are one of the most effective preventive strategies for containing the pandemic and restoring normal life [
Generally, characterizing public vaccine attitudes as part of public health surveillance can be achieved via social media–based text mining or other traditional methodologies, such as conducting surveys or experiments. Social media–based text mining has become increasingly popular because of its effectiveness and efficiency; the major merit of this big data analysis is that it addresses several of the limitations of traditional methodologies, such as the inability to track real-time trends [
We aimed to combine sentiment analysis and topic modeling in order to address the following research questions: What are the general sentiments on COVID-19 vaccines? What are the topics that shape the sentiments? How do concerns (ie, topics with negative sentiments) evolve over time?
We collected COVID-19 vaccine–related tweets containing a variety of predefined hashtags, including #CovidVaccine, #GetVaccinated, #covid19vaccine, #vaccination, #AstraZeneca, #Johnson & Johnson, #Pfizer and #Moderna, from December 14, 2020 (after the first COVID-19 vaccine in the world was approved) to April 30, 2021. We collected 1,122,139 tweets (
Tweet hashtags.
Hashtag | Tweets (N=1,122,139), n |
#CovidVaccine | 345,537 |
#GetVaccinated | 73,817 |
#covid19vaccine | 130,043 |
#vaccination | 132,327 |
#AstraZeneca | 126,954 |
#Johnson & Johnson | 211,731 |
#Pfizer | 61,979 |
#Moderna | 39,751 |
Data processing workflow. LDA: latent Dirichlet allocation; VADER: Valence Aware Dictionary for Sentiment Reasoning.
We used the Valence Aware Dictionary for Sentiment Reasoning (VADER) lexicon for analysis. During preprocessing, we did not remove the hashtag content because it often contained meaningful information such as the brand of the vaccine. VADER is a rule–based sentiment analysis tool that has been proven to perform as well as or even better than other sentiment analysis tools on social media texts in most cases, since it is specifically attuned to sentiments expressed on social media [
We classified each tweet into 1 of 5 groups (
Sentiment polarity examples.
Sentiment polarity | Example |
Highly positive | “thank god vaccination vaccinessavelives vaccineswork” |
Positive | “it s an exciting day with the arrival of the first coronavirusvaccine it gives me great hope for 2021 covid19vaccine” |
Highly negative | “it s fake you re all stupid covidvaccine” |
Negative | “how do we know that after 6 9 months there are no adverse effects of the vaccine or that it s ineffective and what s the response if in the event these emergency approvals have larger ramifications any mechanism being put together covid_19 covid19vaccine” |
Neutral | “help is on the way 1st doses of covid19vaccine arrived in north carolina initial vaccine supply is limited and will go to a small number of public health and hospital workers at high risk of exposure more doses are on the way but until then practice your 3ws” |
Latent Dirichlet allocation (LDA), as a popular and well-established approach for topic analysis [
To determine the optimal number of topics with favorable model performance, we used a coherence score; however, because the number of samples for highly positive and negative groups were small, we combined positive and highly positive groups (into a positive group) and negative and highly negative groups (into a negative group). Then, we applied topic modeling algorithms on 3 groups: positive, neutral, and negative. We used the topic coherence value to measure the modeling performance. Since the data set was very large, the experiments were run under the server environment with C5 computing type series IV 64-core CPU and 128 GB RAM. Then, based on the performance, we selected the optimal number of topics for each polarity group. The optimal topic numbers for positive, neutral, and negative were 12, 10, and 10, respectively (
Model performance for topic numbers for (a) positive, (b) neutral, and (c) negative tweets.
Overall, positive sentiment was stronger than negative sentiment (
There were 6899 highly positive tweets, 398,661 positive tweets, 245,976 neutral tweets, 204,084 negative tweets, and 1508 highly negative tweets (
Overall daily average sentiment score.
Overall sentiment trend.
Sentiment polarity category distribution.
The percentage of negative sentiments was stable (
Sentiment polarity distribution by month.
Common words for (a) highly positive, (b) highly negative, (c) positive, and (d) negative tweets.
Additionally, the names of COVID-19 vaccine manufacturers
Daily average positive and negative sentiment scores for (a) Johnson & Johnson, (b) AstraZeneca, (c) Pfizer, and (d) Moderna vaccines and sentiment trends for (e) Johnson & Johnson, (f) AstraZeneca, (g) Pfizer, and (h) Moderna vaccines.
For Pfizer and Moderna vaccines, positive and negative sentiment curves were found to intersect only in December 2020 and January 2021, and the sentiment trends were stable, which reflected public concerns in the beginning, when the vaccines were first approved, followed by increasing levels of confidence in the vaccines as more and more people became vaccinated.
Daily standard deviation of sentiments for (a) Johnson & Johnson, (b) AstraZeneca, (c) Pfizer, and (d) Moderna vaccines.
Sentiment polarity distributions for Pfizer, AstraZeneca, Johnson & Johnson, and Moderna vaccines.
Topics suggested that people felt happy and grateful that a vaccine had been approved (
Top 5 positive (including highly positive) topics.
Topic ID | Tweets, n (%) | Keywords | Topic |
POS_05 | 251,979 (62.13) | people, take, say, make, go, good, need, help, well, give | Planning for getting vaccination |
POS_07 | 76,029 (18.75) | get, today, dose, first, feel, shoot, day, second, shot, be | Getting vaccinated |
POS_09 | 21,127 (5.21) | share, read, important, health, join, question, public, information, community, concern | Vaccine information and knowledge |
POS_11 | 14,286 (3.52) | thank, clinic, staff, support, team, volunteer, work, process, amazing, effort | Thanks for healthcare worker |
POS_01 | 6,963 (1.72) | effective, risk, variant, pause, blood_clot, virus, benefit, less, rare, infection | Side effects |
The main neutral topics were vaccination appointment (79,710/245,976, 32.41%) and getting vaccinated (40,532/245,976, 16.48%) (
Top 5 neutral topics.
Topic ID | Tweets, n (%) | Keywords | Topic |
NEU_05 | 79,710 (32.41) | get, today, appointment, shoot, available, be, call, wait, come, schedule | Vaccination appointment |
NEU_02 | 40,532 (16.48) | dose, first, receive, second, shot, pfizer, day, week, administer, fully | Getting vaccinated |
NEU_09 | 31,409 (12.77) | say, take, go, people, time, still, need, rare, would, think | Vaccine hesitancy |
NEU_03 | 17,156 (6.97) | update, read, find, late, live, news, check, watch, question, link | Vaccine news |
NEU_06 | 17,129 (6.96) | may, start, age, year, week, open, next, eligible, site, begin | Vaccine eligibility |
Negative topics (
Negative (including highly negative) topics.
Topic ID | Tweets, n (%) | Keywords | Topics |
NEG_05 | 115,206 (56.04) | get, people, take, go, say, make, know, stop, need, still | Vaccine hesitancy |
NEG_00 | 19,690 (9.58) | risk, death, case, report, blood_clot, rare, severe, low, receive, blood | Extreme side effects |
NEG_06 | 17,154 (8.34) | government, country, pay, company, rollout, state, plan, fail, stock, supply | Vaccine supply and rollout |
NEG_04 | 14,125 (6.87) | get, shoot, feel, arm, day, hour, today, shot, sore, second | Common side effects |
NEG_07 | 10,248 (4.98) | appointment, wait, available, age, site, open, today, hospital, group, offer | Vaccination appointment |
NEG_03 | 8080 (3.93) | use, emergency, say, suspend, break, astrazeneca, official, country, shortage, pause | AstraZeneca suspension |
NEG_02 | 7100 (3.45) | dose, week, first, second, receive, next, day, ruin, delay, administer | Vaccine administration |
NEG_09 | 6151 (2.99) | read, question, health, public, story, information, hesitancy, register, community, explain | Vaccine information and community |
NEG_01 | 4471 (2.17) | pandemic, virus, new, fight, variant, lockdown, avoid, coronavirus, spread, restriction | Spread avoidance |
NEG_08 | 3367 (1.64) | cause, cancer, clot, woman, trust, product, doctor, body, choice, damage | Extreme side effects on vulnerable groups |
We found that 47.32% of the tweets (405,560/857,128), demonstrated positive (including highly positive) attitudes toward COVID-19 vaccines. The main topics included encouraging people to get vaccinated and conveying hope and gratitude for future life as a result of vaccine approval. Overall, 23.99% of the tweets (205,592/857,128) expressed negative (including highly negative) attitudes and concerns. The main concerns regarding COVID-19 vaccines were side effects of vaccination, serious adverse reactions, and vaccine supply.
Side effects, such as pain at the injection site (ie, NEG_05) were discussed the most (of all negative topics) throughout the period (
Heatmap of negative topic evolution. The x-axis represents the week in the year. Lighter colors correspond to topics that are discussed more.
Most sentiments toward COVID-19 vaccines were neutral and positive. Positive sentiment was stronger than negative sentiment throughout the period. Previous results from research conducted from March 1 to November 22, 2020 (before vaccines were available) [
By applying topic modeling to our data set, we found that the main topic in the positive and neutral domain was encouraging people to get vaccinated. In general, we discovered that vaccines are becoming widely accepted by the public as time passes. The main topic of our negative data set was the severe side effects of vaccination. When some social media outlets reported possible vaccination side effects, the concerns were discussed frequently on different social media platforms, such as Twitter, and possibly impacted individual decisions. Before vaccines were available, discussions on vaccines were centered around clinical trials and vaccine availability [
We also found that among the negative tweets, other than vaccine hesitancy, the main concerns regarding side effects (NEG_00 and NEG_04) were vaccine supply and rollout (NEG_06). This finding is consistent with those from previous studies [
Overall, it was observed that positive sentiment distribution decreased, neutral sentiment distribution increased, and negative sentiment distribution was stable. However, positive sentiment was dominant throughout the study period (December 14, 2020 to April 30, 2021). Positive sentiment decreased in March and April 2021, likely because of the extreme side effects (blood clotting) reported in the news for Johnson & Johnson and AstraZeneca vaccines. Use of the AstraZeneca vaccine was even stopped in Europe briefly [
In the very beginning, such side effects were extensively discussed. Some news outlets reported severe side effects, such as Bell palsy and even death [
Sentiment trend findings were consistent with those from a previous study [
In this study, we mainly focused on textual information from the Twitter platform. However, users may be distributed among different social media platforms and different locations according to their usage, language, and preferences. Therefore, the methods used in our study can be extended to different social media platforms. It is also possible to use geographical filters on location information or to work on other languages to precisely differentiate between the significant issues and concerns among the different cultures or demographics.
Furthermore, our model can be extended to other research problems. For example, future studies should focus on negative tweets to determine whether misinformation exists or to identify misinformation on social media and propose suggestions for how to minimize the spread of such misinformation. Moreover, it may be plausible in the future to train a topic model with LDA and deep learning to forecast event topics and trends.
Our work profiles the spectrum of public sentiments toward vaccination and the main concerns underlying these views since the rollout of vaccines. These findings demonstrate the effectiveness of sentiment–based topic modeling in identifying topics and trends in polarity groups and in revealing the dynamic nature of public attitudes toward vaccination in the midst of evolving situations and changing public measures during the pandemic. Adding sentiment analysis and topic modeling when monitoring COVID-19 vaccine awareness can help researchers uncover time–based viewpoints underlying the dynamic public attitude toward vaccination on a large scale and devise tailored communication strategies to promote vaccination.
Related work on sentiment analysis or topic modeling.
Centers for Disease Control
US Food and Drug Administration
latent Dirichlet allocation
Valence Aware Dictionary for Sentiment Reasoning
This study was supported by a San Diego State University Master Research Scholarship and by Research Funds from Fowler College of Business. We thank Professor David Banks from Duke University for providing helpful and constructive comments and suggestions.
None declared.