This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
The unprecedented speed of COVID-19 vaccine development and approval has raised public concern about its safety. However, studies on public discourses and opinions on social media focusing on adverse events (AEs) related to COVID-19 vaccine are rare.
This study aimed to analyze Korean tweets about COVID-19 vaccines (Pfizer, Moderna, AstraZeneca, Janssen, and Novavax) after the vaccine rollout, explore the topics and sentiments of tweets regarding COVID-19 vaccines, and examine their changes over time. We also analyzed topics and sentiments focused on AEs related to vaccination using only tweets with terms about AEs.
We devised a sophisticated methodology consisting of 5 steps: keyword search on Twitter, data collection, data preprocessing, data analysis, and result visualization. We used the Twitter Representational State Transfer application programming interface for data collection. A total of 1,659,158 tweets were collected from February 1, 2021, to March 31, 2022. Finally, 165,984 data points were analyzed after excluding retweets, news, official announcements, advertisements, duplicates, and tweets with <2 words. We applied a variety of preprocessing techniques that are suitable for the Korean language. We ran a suite of analyses using various Python packages, such as latent Dirichlet allocation, hierarchical latent Dirichlet allocation, and sentiment analysis.
The topics related to COVID-19 vaccines have a very large spectrum, including vaccine-related AEs, emotional reactions to vaccination, vaccine development and supply, and government vaccination policies. Among them, the top major topic was AEs related to COVID-19 vaccination. The AEs ranged from the adverse reactions listed in the safety profile (eg, myalgia, fever, fatigue, injection site pain, myocarditis or pericarditis, and thrombosis) to unlisted reactions (eg, irregular menstruation, changes in appetite and sleep, leukemia, and deaths). Our results showed a notable difference in the topics for each vaccine brand. The topics pertaining to the Pfizer vaccine mainly mentioned AEs. Negative public opinion has prevailed since the early stages of vaccination. In the sentiment analysis based on vaccine brand, the topics related to the Pfizer vaccine expressed the strongest negative sentiment.
Considering the discrepancy between academic evidence and public opinions related to COVID-19 vaccination, the government should provide accurate information and education. Furthermore, our study suggests the need for management to correct the misinformation related to vaccine-related AEs, especially those affecting negative sentiments. This study provides valuable insights into the public discourses and opinions regarding COVID-19 vaccination.
Despite progress in reducing disease mortality and morbidity in regions with high vaccination rate, challenges remain owing to uncertainties from the recently identified variants of COVID-19 [
Understanding public opinion regarding COVID-19 vaccines is important for public health. Numerous studies have attempted to analyze topics and sentiments regarding COVID-19 vaccines using social media data, such as that from Twitter and Facebook [
In addition, public opinion regarding COVID-19 vaccines may be closely related to concerns about adverse events (AEs). The unprecedented speed of messenger RNA vaccine development and approval has raised concerns that clinical trials were hastened and regulatory standards were relaxed [
Korea reported that 70% of its population had already been fully vaccinated within 8 months of the start of the vaccination drive on February 26, 2021 [
In this study, we explored the overall and brand-specific topics and sentiments related to COVID-19 vaccines in Korea after vaccine rollout. In addition, we examined their topic changes over time.
For research purpose, we raised and answered the following research questions:
What topics have been discussed on Twitter regarding COVID-19 vaccines in Korea?
What are Twitter sentiments regarding the COVID-19 vaccines in Korea? Are they negative, positive, or neutral? Do these sentiments change over time?
What specific topics with respect to vaccine brand types, including Pfizer, Moderna, Jassen, and AstraZeneca, are discussed on Twitter in Korea? Are there any differences among these topics?
How about their sentiments? Do the sentiments change over time?
What are the specific topics on Twitter in Korea with terms related to AEs of COVID-19 vaccines?
We reviewed a rich body of existing literature on topics and sentiments related to COVID-19 vaccines using social media data (
Prior studies on topic modeling have shown vaccine safety and efficacy, vaccine development, national vaccination policies, and vaccine supply to be major topics in a broad framework. The most commonly derived main topic was the concern about AEs, which was the main topic reported in 6 studies [
A total of 20 studies presented sentiments or emotions expressed on social media regarding COVID-19 vaccines. Most of these were analyzed based on positive, neutral, and negative feelings. Nine studies showed that positive sentiments prevailed over other sentiments (neutral and negative) [
This study was ethically approved by the KNU Institutional Review Board (KNU-2021-0118).
This section briefly describes the methods used in this study. More technical details regarding the methods are provided in
Overall methodology for COVID-19 vaccine discourse analysis on Twitter in Korea. This illustrates how our analysis was conducted from data collection to outcome visualization. API: application programming interface; HLDA: hierarchical latent Dirichlet allocation; LDA: latent Dirichlet allocation; POS: parts of speech.
We built a large corpus by collecting Korean tweets mentioning COVID-19 vaccine brands (eg, Pfizer, Moderna, AstraZeneca, Janssen, and Novavax), via academic research access [
In turn, we preprocessed the initial corpus through a series of steps from stemming through short-tweet elimination. In particular, it was found that some of the tweets in the corpus were invalid for the study. There were 2 reasons for such invalidity. First, some of the tweets were retweets. Second, some tweets were not from the general public but from government offices (eg, Gyeongsangbuk-do Provincial Office), advertisements, news media companies (eg, MBC, KBS, and SBS broadcasting company), disaster alert bots (eg, dailycoronabot), and bots collecting play scripts of a character identical to a vaccine brand’s name (eg, Jassen).
We conducted a suite of analyses on the refined corpus with preprocessed tweets, such as latent Dirichlet allocation (LDA)-based topic modeling, hierarchical topic modeling, and sentiment analysis. First, the LDA analysis consisted of 2 phases. In the first phase, coarse-grained LDA analysis was run to determine the general trend, regardless of the vaccine brands. In the second phase, a fine-grained LDA analysis was conducted, focusing on each vaccine brand. To conduct these LDA analyses, a morpheme analysis was run and parts of speech (POS) tagged as common noun (NNG) and proper noun (NNP), denoting general and proper Korean nouns, were extracted from the analysis, and then, the specified vaccine brand names were removed to avoid affecting the analysis. A refined corpus consisting of the identified Korean nouns was embedded as term frequency–inverse document frequency (TF-IDF) for LDA analysis. In particular, we created vaccine-specific topics using pining topic modeling. Pining topic modeling allows to control word prior for each topic. The weight of the word “Pfizer” was set to 1.0 in topic 0, and the weight was set to 0.1 in the rest of the topics by following codes. Likewise, 10 times weight was given to the word “Moderna” in Topic 1, “AstraZeneca” in Topic 2, “Janssen” in Topic 3, “NovaVax” in Topic 4. This allowed the manipulation of topics to be placed at a specific topic number. To determine the optimal number of topics, we examined various indicators such as coherence [
Second, our hierarchical topic modeling analysis was conducted using the same TF-IDF. For this analysis, we built and trained a hierarchical LDA model 1000 times. During training, we were able to prevent the number of topics from drastically increasing by restricting new topic generation through an activated option (called freeze topics). Topic pruning was then applied to the trained model because too many topics made the interpretation challenging. To perform this pruning, the top
Third, we performed sentiment analysis on the full corpus. For this analysis, SentiStrength (version 0.0.9) [
Finally, to test the mean difference in sentiment scores for each vaccine brand, ANOVA and Tukey post hoc test were performed. Time trends were examined using an autoregression model to estimate the linear regression for time series data when the errors were autocorrelated. To analyze the structural changes in the model parameters, the Chow Test for Structural Breaks was performed using this procedure [
We implemented the methodology shown in
The main words in topics 3, 17, and 40 were related to systemic reactions after vaccination, such as muscle pain, headache, fatigue, mild fever, and chills. Topic 20 was mainly related to local reactions, such as specific areas on the body around the injection site (left arm, shoulder, and armpit). Topic 32 revealed experiences and various feelings (eg, pain, worry, relief, and gratitude) regarding vaccination and AEs. The keywords of topic 26 were about AEs related to COVID-19 vaccination of family and friends.
Top 10 topics about COVID-19 vaccines in the collected Twitter data in Korea (N=165,984).
Topics | Proportion, n (%)a | Topic words |
Topic 17: systemic reaction after vaccination | 15,193 (9.15) | Muscle pain, headache, momsalb, symptom, Tylenol, pain, progress, mild fever, energy, and chill |
Topic 32: emotional reaction about vaccination | 12,256 (7.38) | Worry, thank, booster shot, AEc, relief, suffering, cross vaccination, health, flu, and mind |
Topic 20: injection site pain | 8371 (5.04) | Pain, left arm, ache, muscle pain, progress, left Geumgangmakgid, armpit, and muscle |
Topic 26: concern about vaccination of intimate persons | 8060 (4.86) | Mom, dad, friend, worry, little brother, AE, around, family, talking, and parents |
Topic 4: news report about vaccine | 6188 (3.73) | Press, AE, news, Giraegie, problem, government, Korea, United States, report, and people |
Topic 3; health condition after vaccination | 5963 (3.59) | AE, head, mental, sick, all day, stunned, feeling, condition, and pain |
Topic 21: vaccination day | 5807 (3.50) | Friday, work, company, vacation, Saturday, weekend, Monday, school, Thursday, and booster shot |
Topic 36: no-show vaccine reservation success story | 5626 (3.39) | No-show vaccine, success, application, residual, hospital, alarm, neighborhood, Naver, ticketing, and waiting |
Topic 40: taking analgesics to control pain and fever | 5625 (3.39) | Tylenol, momsal, morning, muscle pain, head, chills, pain killers, headaches, taking, and sick |
Topic 18: how to book a no-show vaccine | 4725 (2.85) | Hospital, text, call, change, no-show vaccine, information, application, contact, date, and select |
aThis represents the number of tweets assigned to the topic with the highest probability because 1 tweet has >1 topic.
bThe word “momsal” is a condition caused by extreme fatigue in which one’s body aches and suffers from exhaustion or fever.
cAE: adverse event.
dThe word “Geumgangmakgi” is a traditional Korean taekwondo technique that features a defensive posture with arms raised.
eThe word “Giraegi” is a combination of gija, the Korean word for journalists, and tsuraegi, the Korean word for trash.
Topics 18, 21, and 36 were related to vaccine access. The Korean government has recommended paid sick leaves in workplaces or official absence from schools on the day of vaccination. Thus, people preferred to get their vaccines on Thursdays or Fridays to take a break until the weekend (topic 21). No-show vaccines could be reserved on a first-come, first-served basis through a specific website (eg, Naver) or by phone call to designated clinics. Topics 36 and 18 pertained to web-based and phone reservations, respectively. Topic 4 concerned the news media and the governments of Korea and America. We also performed topic modeling by dividing data by quarterly period (data now shown). As a result, it was confirmed that the topic of vaccination changed over time, and the topic of AE prevailed in the later period compared with the initial period of vaccination.
Hierarchical topic modeling of the COVID-19 vaccine discourse on Twitter in Korea. AE: adverse event; KFDA: Korea Food & Drug Administration.
The research question 3 is about specific topics with respect to vaccine brands.
Approximately 60% (95,857/165,984) of the tweets mentioned ≥2 vaccine brands together. Thus, tweets were classified into topics by vaccine brands based on the highest topic probability calculated by pinning topic modeling to explore public opinion (
The average sentiment score of topics with keywords related to vaccine AEs was significantly lower than that of the other topics (ANOVA and Tukey post hoc test,
Time trend of sentiment score about COVID-19 vaccine on Twitter in Korea.
Top 20 topic words and average sentiment scores for various COVID-19 vaccine brands on Twitter in Korea (N=165,984).
Topics | Proportion, n (%)a | Topic words | Average sentiment score, mean (SEb) |
Topic 0: Pfizer | 46,274 (27.88) | AEc, muscle pain, Tylenol, symptoms, headache, momsald, pain, menstruation, ache, and progress | −1.04 (1.31) |
Topic 1: Moderna | 53,524 (32.25) | No-show vaccine, AE, booster shot, hospital, mom, worry, friend, dad, cross-vaccination, and doctor | −0.33 (1.20) |
Topic 2: AstraZeneca | 22,045 (13.28) | Effectiveness, AE, variant, booster shot, prevention, approval, blood clot, antibody, United States, and virus | −0.48 (0.97) |
Topic 3: Janssen | 13,555 (8.17) | Hospital, no-show vaccine, inoculation, United States, application, booster shot, prevention, confirmation, advance reservation, and civil defense | −0.28 (0.90) |
Topic 4: Novavax | 23,506 (14.16) | Government, Korea, supply, production, Moon Jae-in, United States, people, contract, Japan, and secure | −0.37 (1.02) |
Topic 5: unspecified | 7080 (4.27) | Death, AEs, health, report, adverse reaction, women, causality, heart, examination, and investigation | −0.82 (1.17) |
aThis represents the number of tweets assigned to the topic with the highest probability because 1 tweet has >1 topic.
bAll mean values between COVID-19 vaccine brands exhibit significant differences according to ANOVA and Tukey post hoc test at
cAE: adverse event.
dThe word “Momsal” is a condition caused by extreme fatigue in which one’s body aches and suffers from exhaustion or fever.
Proportions of positive, neutral, and negative sentiments by COVID-19 vaccine brands.
Time trend of sentiment score by COVID-19 vaccine brands.
We conducted a subgroup analysis of tweets with terms related to AEs, including side effects, symptoms, AE, and AE reporting (
Topics of tweets with terms related to adverse events (AE; N=15,371).
Topics | Proportion, n (%)a | Topic words |
Topic 0: systemic reaction | 3839 (24.98) | Muscle pain, headache, Momsalb, pain, aches, mild fever, chills, fever, fatigue, and cold |
Topic 1: local allergic reaction | 931 (6.06) | Armpit, hive, chest, leg, allergy, lymph node, skin, numbness, rash, and lump |
Topic 4: palpitation | 612 (3.98) | Heart, exercise, palpitation, man, woman, chest, caffeine, coffee, overwork, and eyesight |
Topic 3: myocarditis and pericarditis | 1115 (7.25) | Heart, chest, emergency room, myocarditis, pain, dyspnea, chest pain, pain killer, pericarditis, and allergy |
Topic #4: irregular menstruation | 1121 (7.29) | Irregular bleeding, menstrual pain, vaginal bleeding, menstrual cycle, menstrual irregularity, anxiety, premenstrual syndrome, bleeding, due date, and menstrual volume |
Topic 5: changes in appetite and sleep | 1522 (9.9) | Appetite, fatigue, explosion, insomnia, increase, sleep, hunger, sleepiness, digestion, and improve |
Topic 6: suspicion of serious side effects | 383 (2.49) | Death, suspicion, blood cancer, Doo-hwan Chunc, government, leukemia, AE, dad, health, and cerebral hemorrhage |
Topic 7: case report about death or thrombosis | 1412 (9.19) | Death, myocarditis, occurrence, case, report, blood clot, approval, thrombosis, risk, and death |
Topic 8: discontinuation of vaccination | 456 (2.97) | United States, Korea, government, problem, article, disposal, health authority, Japan, suspension, and order |
Topic 9: effectiveness of vaccines | 381 (2.48) | Effect, antibody, booster shot, infection, immunity, variant, confirmation, prevention, virus, and Omicron |
Topic 10: information on side effects | 760 (4.94) | Hospital, AE, symptom, physician, no-show vaccine, phone, talk, information, explanation, and nurse |
Topic 11: concern of vaccination | 2839 (18.47) | Worry, mom, booster shot, friend, relief, suffering, dad, cross-vaccination, brother, and family |
aThis represents the number of tweets assigned to the topic with the highest probability because 1 tweet has >1 one topic.
bThe word “Momsal” is a condition caused by extreme fatigue in which one’s body aches and suffers from exhaustion or fever.
cThe word “Doo-hwan Chun” is the name of the former president.
This study showed that a wide range of topics regarding COVID-19 vaccines have been discussed on Twitter in Korea. The topics related to the COVID-19 vaccine were vaccine-related AEs; emotional reactions such as worries and appreciation for vaccination; vaccine development, supply, or application; and government vaccination policies. Among them, the most important and frequently mentioned topic was AEs related to COVID-19 vaccination. Vaccine-related AEs included systemic and local AEs, myocarditis or pericarditis, thrombus, irregular menstruation, changes in appetite and sleep, leukemia, and death. A topic modeling study in which weights were assigned to various vaccine brands found notable differences in the topics related to the various vaccine brands. The topics pertaining to the Pfizer vaccine mainly mentioned AEs, those related to Moderna and Janssen vaccines focused on vaccine access, those pertaining to AstraZeneca were related to vaccine effectiveness, and those regarding Novavax were issues related to vaccine production and supply. Although the sentiments toward COVID-19 vaccines changed over time, negative sentiments prevailed since the start of the vaccination. In terms of vaccine brands, the topics pertaining to the Pfizer vaccine expressed the strongest negative opinion.
The diffusion of new technologies changes the methods for data collection or the analysis of people’s thoughts, feelings, and actions [
In this context, research on public discourse and opinions on COVID-19 vaccines has mainly used Twitter data [
This study identified topics for other side effects that were not reported in the preapproval clinical trials. For example, changes in menstruation, appetite, and sleep have been reported. Reports on sleep changes are rare. One study showed that sleep duration increased after vaccination based on wearable device data [
In addition, Korean tweets indicated public suspicion of vaccination and the occurrence of leukemia. To date, there has been no evidence of an association between leukemia and COVID-19 vaccination. However, many Korean tweets mentioned that the occurrence of leukemia in the former President Doo-hwan Chun could be related to COVID-19 vaccination. Vaccine-related misinformation on social media platforms may exacerbate vaccine hesitancy [
Several other studies that analyzed sentiments toward COVID-19 vaccines reported positive public opinion [
Two possible explanations for these results are the compulsory vaccination policy and the experiences and concerns regarding AEs after vaccination. After the Omicron variant epidemic, the Korean government permitted only fully vaccinated people to use public places, such as restaurants, cafes, and movie theaters. Similarly, many nations have adopted mandatory vaccinations, including Australia, Brazil, Canada, France, Indonesia, Italy, and the United Kingdom [
The sentiment analysis by vaccine brands suggests that the Pfizer brand had the strongest negative score among the 5 vaccine brands, which was inconsistent with the previous results of phase 3 clinical trials and postmarketing surveillance. Initial trials of Pfizer revealed no significant differences in side effects compared with other vaccine brands. Rather, it was more effective in preventing symptomatic COVID-19 than the AstraZeneca and Jassen vaccines [
To the best of our knowledge, few studies have comprehensively analyzed Korean tweets more than a year after the start of vaccination to determine people’s opinions and perceptions of COVID-19 vaccines. Our topic analysis provides a hierarchical view of the topics related to COVID-19 vaccines that are mainly discussed on a web-based social media platform. Moreover, we tracked the trend of sentiments toward COVID-19 vaccines over time. We also conducted quarter-based topic analyses to reflect the rapidly changing COVID-19 circumstances. Finally, we carefully refined the Korean tweets using various preprocessing methods to obtain high-quality results.
First, caution is needed in interpretation because the relationship between vaccination and response was not analyzed and the topics about AEs did not represent a causal relationship. Second, this study only used the Twitter data. Thus, other social media platforms may contain different opinions because their preferences may vary depending on user characteristics. Third, because most social media users were young adults, our findings may not reflect the views of the entire population. Fourth, our sentiment analysis relied on English translation because of the absence of an adequate tool suitable for the Korean language. Thus, sentiment scores may have been significantly influenced by the success of translation. Furthermore, we dealt only with nouns for topic modeling. Other parts of speech, such as adjectives, adverbs, and verbs, will be considered in future studies. Finally, our sentiment analysis was performed after English translation because of the dependency on the SentiStrength program. Although we double-checked that the translation did not affect the quality of the sentiment analysis results, there could be ambiguity and uncertainty in the translation process, as indicated in the study by Huang et al [
Our results showed persistent public discourses about AEs after vaccination and predominantly negative sentiments on Twitter in Korea. These results suggest that accurate information regarding vaccine-related AEs should be communicated to the general public. In addition, a continuous analysis of public opinion, not a one-time event, is required, and crisis communication should be continuously conducted according to public opinion changes. In particular, the Pfizer vaccine had the most negative sentiment from the early period of vaccination among the five vaccine brands, showing that public opinion is not based on academic evidence. Misinformation on web-based platforms should be controlled properly from a public health perspective. Furthermore, this study on public discourse and opinions after large-scale vaccination over a short period can be a valuable resource for responding to outbreaks of other emerging infectious diseases.
Summary of topics and sentiments from previous studies on COVID-19 vaccines using social media data.
Detailed algorithms and descriptions of each phase of COVID-19 vaccine discourse analysis.
A list of excluded Twitter accounts.
The methods for selecting the optimal number of topics in latent Dirichlet allocation.
Comparison of sentiment analysis using original Korean texts and translated English texts.
The results of autoregression between sentiment score and time by Covid-19 vaccine brands.
adverse event
latent Dirichlet allocation
label intrusion
optimal label
term frequency–inverse document frequency
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (grants NRF-2018R1A6A1A03025109 and 2021R1I1A1A01059268).
None declared.