COVID-19 Vaccine Tweets After Vaccine Rollout: Sentiment–Based Topic Modeling

Background: COVID-19 vaccines are one of the most effective preventive strategies for containing the pandemic. Having a better understanding of the public’s conceptions of COVID-19 vaccines may aid in the effort to promptly and thoroughly vaccinate the community. However, because no empirical research has yet fully explored the public’s vaccine awareness through sentiment–based topic modeling, little is known about the evolution of public attitude since the rollout of COVID-19 vaccines. Objective: In this study, we specifically focused on tweets about COVID-19 vaccines (Pfizer, Moderna, AstraZeneca, and Johnson & Johnson) after vaccines became publicly available. We aimed to explore the overall sentiments and topics of tweets about COVID-19 vaccines, as well as how such sentiments and main concerns evolved. Methods: We collected 1,122,139 tweets related to COVID-19 vaccines from December 14, 2020, to April 30, 2021, using Twitter’s application programming interface. We removed retweets and duplicate tweets to avoid data redundancy, which resulted in 857,128 tweets. We then applied sentiment–based topic modeling by using the compound score to determine sentiment polarity and the coherence score to determine the optimal topic number for different sentiment polarity categories. Finally, we calculated the topic distribution to illustrate the topic evolution of main concerns. Results: Overall, 398,661 (46.51%) were positive, 204,084 (23.81%) were negative, 245,976 (28.70%) were neutral, 6899 (0.80%) were highly positive, and 1508 (0.18%) were highly negative sentiments. The main topics of positive and highly positive tweets were planning for getting vaccination (251,979/405,560, 62.13%), getting vaccination (76,029/405,560, 18.75%), and vaccine information and knowledge (21,127/405,560, 5.21%). The main concerns in negative and highly negative tweets were vaccine hesitancy (115,206/205,592, 56.04%), extreme side effects of the vaccines (19,690/205,592, 9.58%), and vaccine supply and rollout (17,154/205,592, 8.34%). During the study period, negative sentiment trends were stable, while positive sentiments could be easily influenced. Topic heatmap visualization demonstrated how main concerns changed during the current widespread vaccination campaign. Conclusions: To the best of our knowledge, this is the first study to evaluate public COVID-19 vaccine awareness and awareness trends on social media with automated sentiment–based topic modeling after vaccine rollout. Our results can help policymakers and research communities track public attitudes toward COVID-19 vaccines and help them make decisions to promote the vaccination campaign.


Introduction
Background  vaccines are one of the most effective preventive strategies for containing the pandemic and restoring normal life [1]. The outcomes of this strategy highly depend on vaccination coverage, wherein herd immunity requires at least 70% of the population to be immune, depending on how contagious the COVID-19 variant in question is and how effective the vaccine is [2]. However, such a high rate of vaccination cannot be reached without the cooperation of the general public [3][4][5]. In general, there are a variety of factors that may negatively impact how the public perceives and reacts to these vaccines. Such barriers may stem from conspiracy theories [6], general hesitancy toward vaccines [4], and doubts regarding new mRNA vaccine technologies [7]. Infodemic management, that is, managing information overload, including false or misleading information [8], should be used during the COVID-19 pandemic, by listening to community concerns, preventing the spread of misleading information [9], and examining the human factors contributing to COVID-19 transmission [10]. Thus, to promote vaccine awareness and facilitate vaccine rollout, it is imperative to gain a timely understanding of the public's attitude toward vaccination and develop tailored communication strategies to address their concerns.
Generally, characterizing public vaccine attitudes as part of public health surveillance can be achieved via social media-based text mining or other traditional methodologies, such as conducting surveys or experiments. Social media-based text mining has become increasingly popular because of its effectiveness and efficiency; the major merit of this big data analysis is that it addresses several of the limitations of traditional methodologies, such as the inability to track real-time trends [4,11]. Public health monitoring on social media has proven to be a powerful tool for analyzing public health discussions on a variety of topics, such as pandemics and vaccination [12][13][14][15][16][17][18][19][20][21][22][23][24]. Such work has been conducted for the COVID-19 pandemic (Multimedia Appendix 1). However, because of the rapid COVID-19 vaccine rollout, dedicated social media-based sentiment analysis studies on vaccine awareness have just started to emerge [3,[22][23][24]. Some of these studies [3,22] relied on natural language processing techniques to conduct large-scale sentiment analysis about vaccines, while others [23,24] investigated vaccination hesitancy using manual content analysis, but overall, these studies lacked either the capability to automatically track public attitudes (in manual content analysis) or a comprehensive view of both topics and associated sentiments. Furthermore, exploring the public sentiment and concern evolution throughout the current vaccination campaign may allow policymakers to make timely and informed decisions to encourage vaccination.

Study Objectives
We aimed to combine sentiment analysis and topic modeling in order to address the following research questions: What are the general sentiments on COVID-19 vaccines? What are the topics that shape the sentiments? How do concerns (ie, topics with negative sentiments) evolve over time?

Sentiment Analysis
We used the Valence Aware Dictionary for Sentiment Reasoning (VADER) lexicon for analysis. During preprocessing, we did not remove the hashtag content because it often contained meaningful information such as the brand of the vaccine. VADER is a rule-based sentiment analysis tool that has been proven to perform as well as or even better than other sentiment analysis tools on social media texts in most cases, since it is specifically attuned to sentiments expressed on social media [25]. Generally, VADER produces 4 scores: positive, neutral, negative, and compound scores. Positive, neutral, and negative scores each represent the proportion of words that fall into the given category. The compound score is calculated by summing the valence scores of each word in the lexicon, adjusting the value according to heuristic rules, and normalizing between −1 and +1 [25]. The compound score is a useful metric for measuring the sentiment of each given text in a single dimension.
We classified each tweet into 1 of 5 groups (Table 2), based on compound, positive, and negative score distributions-highly positive (compound score >0.001 and positive sentiment score >0.5), positive (compound score >0.001 and positive sentiment score <0.5), highly negative (compound score <0.001 and negative sentiment score >0.5), and negative (compound score <0.001 and negative sentiment score <0.5), and neutral (if none of the conditions was satisfied). "it s an exciting day with the arrival of the first coronavirusvaccine it gives me great hope for 2021 covid19vaccine" Positive "it s fake you re all stupid covidvaccine" Highly negative "how do we know that after 6 9 months there are no adverse effects of the vaccine or that it s ineffective and what s the response if in the event these emergency approvals have larger ramifications any mechanism being put together covid_19 covid19vaccine" Negative "help is on the way 1st doses of covid19vaccine arrived in north carolina initial vaccine supply is limited and will go to a small number of public health and hospital workers at high risk of exposure more doses are on the way but until then practice your 3ws"

Topic Modeling
Latent Dirichlet allocation (LDA), as a popular and well-established approach for topic analysis [26], is a three-level hierarchical Bayesian model that relies on the bag-of-words model [27]. LDA generates a probability distribution for the text corpus; it assumes that each topic can be characterized by a distribution of words. The number of topics is a key parameter of the LDA model. To prevent the misclassification of other topics into vaccine and nonvaccine topics, we removed some vaccine-related keywords, including "vaccine," "vaccines," "vaccination," "covidvaccine," and "covid." This data preprocessing decision is also well supported by experimental results, which suggested that up to 96% of tweets were classified into one main topic with less meaningful information without removal of specific words.
To determine the optimal number of topics with favorable model performance, we used a coherence score; however, because the number of samples for highly positive and negative groups were small, we combined positive and highly positive groups (into a positive group) and negative and highly negative groups (into a negative group). Then, we applied topic modeling algorithms on 3 groups: positive, neutral, and negative. We used the topic coherence value to measure the modeling performance. Since the data set was very large, the experiments were run under the server environment with C5 computing type series IV 64-core CPU and 128 GB RAM. Then, based on the performance, we selected the optimal number of topics for each polarity group. The optimal topic numbers for positive, neutral, and negative were 12, 10, and 10, respectively ( Figure 2).

Sentiment Analysis
Overall, positive sentiment was stronger than negative sentiment ( Figure 3 and Figure Figure 7 shows word clouds with profanities removed for highly positive, highly negative, positive, and negative tweets. Except for "vaccine" and "COVID," which exhibited the highest frequency, the most common positive words in the highly positive group were "great," "happy," and "love"; the most common negative words in the highly negative group were "kill," "bad," and "death"; the most common positive words in the positive group were "thank," "like," and "health"; and the most common negative words in the negative group were "death," "clot," and "risk."   For Pfizer and Moderna vaccines, positive and negative sentiment curves were found to intersect only in December 2020 and January 2021, and the sentiment trends were stable, which reflected public concerns in the beginning, when the vaccines were first approved, followed by increasing levels of confidence in the vaccines as more and more people became vaccinated. Figure 9 shows the standard deviation of sentiments for each vaccine. For the Pfizer and Moderna vaccines, the standard deviation lines are flat, which means that the sentiments for these vaccines were very stable and did not exhibit much fluctuation. However, for Johnson & Johnson and AstraZeneca vaccines, the standard deviation of sentiments changed drastically over time. For instance, the standard deviation of the Johnson & Johnson vaccine decreased, implying a higher degree of consensus regarding this specific vaccine. However, the opposite was true for the AstraZeneca vaccine, and the increased sentiment variation indicated the attitudes toward it were found to be more divided over time. Figure 10 shows the percentages of tweets for each vaccine in each sentiment polarity; the percentages in each sentiment group are very close to each other.

Positive Topics
Topics suggested that people felt happy and grateful that a vaccine had been approved (Table 3), that it is important to get vaccinated, that they were thankful to the health care staff for their efforts, and that they were waiting to be eligible for vaccination.

Negative Topics
Negative topics (Table 5) demonstrated the public's main concerns regarding COVID-19 vaccines. In general, the public mainly cared about the side effects of vaccines, including common side effects, such as soreness after receiving a vaccine, and serious adverse reactions, such as death. However, given the strict storage requirement, the vaccines' supply chain and rollout were the second most important issue that concerned the public. Other negative topics involved the vaccination appointment, coronavirus variants, vaccination for women and patients with cancer (people who are at high risk), fake news, and misinformation. We found that 47.32% of the tweets (405,560/857,128), demonstrated positive (including highly positive) attitudes toward COVID-19 vaccines. The main topics included encouraging people to get vaccinated and conveying hope and gratitude for future life as a result of vaccine approval. Overall, 23.99% of the tweets (205,592/857,128) expressed negative (including highly negative) attitudes and concerns. The main concerns regarding COVID-19 vaccines were side effects of vaccination, serious adverse reactions, and vaccine supply.

Topic Evolution
Side effects, such as pain at the injection site (ie, NEG_05) were discussed the most (of all negative topics) throughout the period ( Figure 11). Moreover, with the increase in the number of people who received the vaccine, the discussion on side effects increased. Topics such as vaccine supply (ie, NEG_00) and extreme side effects (ie, NEG_06) were discussed less but a consistent amount throughout the period. Figure 11. Heatmap of negative topic evolution. The x-axis represents the week in the year. Lighter colors correspond to topics that are discussed more.

General Sentiments
Most sentiments toward COVID-19 vaccines were neutral and positive. Positive sentiment was stronger than negative sentiment throughout the period. Previous results from research conducted from March 1 to November 22, 2020 (before vaccines were available) [3] were similar-the dominant sentiments were positive and neutral; however, in this study, negative sentiment (205,592/857,128, 23.99%) was lower than that in [3] (30.57%). This suggests that after the COVID-19 vaccines became available, their effectiveness in reducing the risk of infection started to manifest in the real world, and people started having fewer doubts on social media toward vaccines. Vaccine trials, social media, and government interventions may contribute to alleviating public concerns [31].

Concerns and Topics That Shape Attitudes
By applying topic modeling to our data set, we found that the main topic in the positive and neutral domain was encouraging people to get vaccinated. In general, we discovered that vaccines are becoming widely accepted by the public as time passes. The main topic of our negative data set was the severe side effects of vaccination. When some social media outlets reported possible vaccination side effects, the concerns were discussed frequently on different social media platforms, such as Twitter, and possibly impacted individual decisions. Before vaccines were available, discussions on vaccines were centered around clinical trials and vaccine availability [12]. However, upon vaccine rollout, the concerns shifted dramatically to common side effects, which dominated the discussion throughout the study period (from December 14, 2020 to April 30, 2021). Hence, timely monitoring of the public attitude can help guide public health officials to communicate more effectively with the public.
We also found that among the negative tweets, other than vaccine hesitancy, the main concerns regarding side effects (NEG_00 and NEG_04) were vaccine supply and rollout (NEG_06). This finding is consistent with those from previous studies [24,32,33]. For example, in a study on vaccination hesitancy in Canada [24], it was found that vaccination hesitancy stemmed from mistrust toward vaccine development, lack of knowledge about COVID-19 vaccines, and suspicion about political and authority figures who were not taking the vaccine. In another study [32] employing a questionnaire for the Israeli population, the results showed that the top 3 concerns regarding COVID-19 vaccines were quality control, side effects, and doubtful efficiency. Another survey conducted in the United States and Canada showed that vaccine rejection is very strongly related to vaccine benefits, vaccine safety, and unforeseen future effects [33]. Overall, our findings were similar-the top concerns were vaccine safety, side effects, vaccine supply, and government policy.

Changes by Month
Overall, it was observed that positive sentiment distribution decreased, neutral sentiment distribution increased, and negative sentiment distribution was stable. However, positive sentiment was dominant throughout the study period (December 14, 2020 to April 30, 2021). Positive sentiment decreased in March and April 2021, likely because of the extreme side effects (blood clotting) reported in the news for Johnson & Johnson and AstraZeneca vaccines. Use of the AstraZeneca vaccine was even stopped in Europe briefly [29], and the FDA and CDC called for a pause on the use of the Johnson & Johnson vaccine in the United States [28]. This may have caused positive sentiment to decrease, while neutral sentiment rather than negative sentiment increased, because people tended to feel neutral rather than very negative, toward such a pause.
In the very beginning, such side effects were extensively discussed. Some news outlets reported severe side effects, such as Bell palsy and even death [34], after receiving the vaccine, which seemed to coincide with more negative sentiments. Both Pfizer and Moderna vaccines are mRNA vaccines, which is a new type of vaccine that has not been used before [35]. This caused the general public to have concerns regarding the long-term side effects of these novel vaccines [7]. In the beginning, the lack of knowledge about COVID-19 and mRNA vaccines shaped the public's concerns. However, as more people were vaccinated over time, more people were able to observe how these vaccines helped steadily decrease the number of new cases and deaths per day as well as the hospitalization rates, implying that the pandemic is under control thanks to these vaccines. This in turn resulted in an increasing number of people seeking to become vaccinated, because extreme side effects are very rare and might be associated with misinformation and because the common side effects are regarded as tolerable.
Sentiment trend findings were consistent with those from a previous study [22] in which a vaccine acceptance experiment using Weibo Sina (a popular social media platform in China) demonstrated that positive attitudes were dominant, that the Chinese population were inclined to be positive about the side effects over time, and that one of the concerns that affects vaccine acceptance are misunderstandings about vaccination.

Limitations and Future Work
In this study, we mainly focused on textual information from the Twitter platform. However, users may be distributed among different social media platforms and different locations according to their usage, language, and preferences. Therefore, the methods used in our study can be extended to different social media platforms. It is also possible to use geographical filters on location information or to work on other languages to precisely differentiate between the significant issues and concerns among the different cultures or demographics. Furthermore, our model can be extended to other research problems. For example, future studies should focus on negative tweets to determine whether misinformation exists or to identify misinformation on social media and propose suggestions for how to minimize the spread of such misinformation. Moreover, it may be plausible in the future to train a topic model with LDA and deep learning to forecast event topics and trends.

Conclusions
Our work profiles the spectrum of public sentiments toward vaccination and the main concerns underlying these views since the rollout of vaccines. These findings demonstrate the effectiveness of sentiment-based topic modeling in identifying topics and trends in polarity groups and in revealing the dynamic nature of public attitudes toward vaccination in the midst of evolving situations and changing public measures during the pandemic. Adding sentiment analysis and topic modeling when monitoring COVID-19 vaccine awareness can help researchers uncover time-based viewpoints underlying the dynamic public attitude toward vaccination on a large scale and devise tailored communication strategies to promote vaccination.