Evolution of the public opinion on COVID-19 vaccination in Japan

Vaccines are promising tools to control the spread of COVID-19. An effective vaccination campaign requires government policies and community engagement, sharing experiences for social support, and voicing concerns to vaccine safety and efficiency. The increasing use of online social platforms allows us to trace large-scale communication and infer public opinion in real-time. We collected more than 100 million vaccine-related tweets posted by 8 million users and used the Latent Dirichlet Allocation model to perform automated topic modeling of tweet texts during the vaccination campaign in Japan. We identified 15 topics grouped into 4 themes on Personal issue, Breaking news, Politics, and Conspiracy and humour. The evolution of the popularity of themes revealed a shift in public opinion, initially sharing the attention over personal issues (individual aspect), collecting information from the news (knowledge acquisition), and government criticisms, towards personal experiences once confidence in the vaccination campaign was established. An interrupted time series regression analysis showed that the Tokyo Olympic Games affected public opinion more than other critical events but not the course of the vaccination. Public opinion on politics was significantly affected by various events, positively shifting the attention in the early stages of the vaccination campaign and negatively later. Tweets about personal issues were mostly retweeted when the vaccination reached the younger population. The associations between the vaccination campaign stages and tweet themes suggest that the public engagement in the social platform contributed to speedup vaccine uptake by reducing anxiety via social learning and support.


I. INTRODUCTION
Vaccination is an effective mechanism to reduce the number of hospitalisations and mortality caused by the emergent coronavirus disease (COVID-19).With the advent of efficient vaccines after the first wave of the COVID-19 pandemics, public health efforts turned on strategies to cost-effectively immunise the population to increase survival and return economic activity.The availability of doses and uptake rates are fundamental aspects to reach a sufficient vaccination coverage but those numbers varied across countries in the current pandemics.One particular concern was the hesitancy against the safety and effectiveness of COVID-19 vaccines 1 , that affected the individual willingness to get vaccinated not only in lowand middle-income 2,3 but also in high-income countries [4][5][6][7] .Japan stood out among developed economies as having one of the lowest vaccine confidence levels in the population 8 .This results from safety concerns about the Human papillomavirus (HPV) vaccine that emerged in the early 2010s as a result of misinformation spread on adverse effects of the HPV vaccine 9,10 , prompting the Japanese Ministry of Health, Labour and Welfare to suspend the proactive recommendations of HPV vaccine from June 2013 until November 2021.Such low public confidence pushed back the start of mass vaccination against COVID-19 for two months behind the USA, China, and European countries, leading to safety concerns and inquiries on the Tokyo Olympic Games that had already been postponed to August 2021.Albeit late, Japan achieved high vaccination coverage in a short time, becoming one of the highest in the world (ranked 13th among 229 countries) and reaching 77.8% (at least one dose) on the 31st of October 2021 11 , ranking ahead of early adopters such as the UK (73.3%), Germany (69.0%), and the USA (66.7%).
It is unclear how public opinion affected government policies that were also pressed by the domestic economic slowdown and the concerns about the Tokyo Olympic Games.On the other hand, public opinion typically reacts to policies and might serve as a barometer of government strategies.Monitoring public opinion however is challenging.The largest study of vaccination intention in Japan surveyed 30 000 participants 7 and found that a large proportion of the population was unsure (33%) or unwilling (11%) to take COVID-19 vaccine, with side effects and safety being the main reasons.Classic survey studies like this are costly, relatively slow, and with few exceptions 8 cannot trace changes of public opinion in real-time 2,6,7,12 .Large-scale studies aiming to increase accuracy and the spatio-temporal resolution of responses require advanced survey techniques.In recent years, Human activity has been increasingly mediated by digital devices, leaving footprints that can be exploited to assess the population health and opinions [13][14][15][16][17] .In the context of COVID-19, social media data have been used to predict the number of new cases FIG. 1. Data processing workflow.
(incidence) 18,19 and to interpret the public perception of the pandemics 20 .Twitter has been particularly useful to monitor public opinion because users engage and react timely to environmental changes, as for example reacting to epidemic outbreaks 21,22 , expressing concerns on the disease 23 , accepting the pandemic situation 24 , or on vaccination issues [25][26][27] .
We hypothesise that Twitter activity can be used as a barometer of public concerns and response to government policies on the mass vaccination campaign in Japan during the COVID-19 pandemics.Twitter is widely used in Japan, where more than 60% of the population below 40 years old is actively engaged 28 .The pervasiveness of Twitter provides a unique source of data to monitor the evolution of the public opinion during the various stages of the Japanese vaccination campaign.To validate our hypothesis, we monitored over 8 million users (approx.6.4% of the Japanese population) and collected over 100 million tweets (including 76 million retweets) written in Japanese from January 1 to October 31, 2021.Our sample covers the period before the start of the vaccination campaign until weeks after the end of the Olympic Games in Tokyo.We perform topic modelling 29,30 by applying a methodology based on the Latent Dirichlet Allocation (LDA) model 31 to disentangle the textual information and find underlying semantic structures in tweets.

II. RESULTS
The original data set contains 114 357 691 vaccine-related tweets written in Japanese from January 1 to October 31, 2021.Our analysis is based on three samples containing either the original tweets or retweets The first sample (sample 1) contains 24 032 297 tweets posted by 6 034 434 users and is used to study the evolution of the opinions, including disruptions due to critical events.A random sample of the original data (sample 2) is then used to identify the main topics and themes, and a sample of all retweets (sample 3) is used to study the spread of opinions (Fig. 1).

A. Vaccine Related Tweets
Figure 2A shows the number of (vaccine-related) tweets per day during the study period and highlights four critical events 32 : 1) the launch of the COVID-19 vaccination campaign by the Japanese government, on February 17, focusing initially on essential workers (e.g.healthcare workers); 2) the start of vaccination of the elderly population (above 65 years old) on April 12; 3) the start of the general public vaccination on June 21; and 4) the Tokyo Olympic Games taking place from July 23 to August 8.The first peak occurs on January 21, when the Prime Minister Yoshihide Suga made an statement that "the high coverage of the vaccination is not a precondition for holding the Olympics in Tokyo" and the Ministry of Health, Labour and Welfare (MHLW) signed a contract with Pfizer Inc. to supply a total of 72 million doses of its COVID-19 vaccine.Although a spike is observed at the very start of the vaccination campaign (event 1), vaccine-related tweets started to increased after event 2, when the vaccination of nonessential workers began.That coincides with the outbreak of the 4th wave in Japan (early April, Fig. 2B) and is followed by increased interest during the peak of infections in the 4th wave (May 13), when the online booking of vaccine appointments was launched but became overwhelmed, leaving many people without a vaccination slot.There was a sharp relative decrease in the tweets at the start and end of the Olympic Games, fol-lowed by the largest peak on August 26, when a contamination scandal (approx.1.6 million doses of the Moderna vaccine were discarded) was publicized.This last peak also coincides with the peak of the 5th wave and is followed by a substantial decrease on vaccine-related tweets, likely because of the high vaccination rate in the population (Fig. 2A,B).

B. Clustering Vaccine Related Tweets
The ranking of the most used words on vaccine-related tweets (sample 2) reveals that 242 627 (24%) of them explicitly contained the word "COVID-19" (SI Appendix, Table S1).While the prevalence of specific words in tweets can reveal patterns of popular words, this measure is unable to unveil hidden semantic relations among the tweets.We thus apply a machine learning methodology, the Latent Dirichlet Allocation (LDA) model, on a sample of 1 000 000 tweets (100 000 per month, sample 2) to automatically identify and classify (i.e.cluster) tweets into meaningful topics.This monthly sampling is used to remove the non-stationarity of the tweet activity given the unbalance in the number of vaccine-related tweets during the study period.Using LDA, we automatically identified 15 topics from the tweets (solely based on the textual content) and manually grouped them into four general themes: 1) Personal issue, 2) Breaking news, 3) Politics, and 4) Conspiracy and humour.Table I shows examples of representative tweets in each theme and Table S2 (SI Appendix) shows the most popular words in each topic.
The most popular theme emerging from the topic analysis was Personal issue (49.8%) and is formed by two topics about personal issues before being vaccinated, i.e. personal view on the vaccination and personal schedule of vaccination, and four topics about personal experience after being vaccinated, i.e. a topic about live reporting on the vaccination experience (e.g.waiting room or to/from the vaccination center), and three topics about individual vaccination experience including: 1) complaints about discomfort, and side effects and personal life after the vaccination; 2) reporting body temperature after taking the vaccine; and 3) advice to overcome side effects (Table I).
The second most popular theme was classified as Breaking news (21.3%), that includes two topics about news on COVID-19 vaccine such as vaccine development and approval, and the vaccine effectiveness.The first topic includes tweets about the development of Moderna, AstraZeneca, and Pfizer vaccines (clinical trials and government approval) in Japan and other countries.The second topic is about the effectiveness of vaccines and contains information about mRNA vaccines, the effectiveness of vaccines against new variants, and serious side effects (e.g.trombus) of the AstraZeneca vaccine.The last topic is about booking an appointment for vaccination, in particular about the availability and whether users could successfully book a timeslot or not (Table I).
Politics was classified as the third most popular theme (17.2%) with three topics.The first topic was related to opinions about the government.For instance, the users complained that the vaccination schedule in Japan was behind other coun- tries and disagreed on holding the Tokyo Olympic Games given the low vaccination coverage.Opinions about the mass media, such as the complaints about the unreliable information from the media and the attitude of the press inciting unrest, formed the second topic.Finally, the vaccination policy, including casual chats, e.g.tweets mentioning the assignment of Mr Taro Kono (a politician famous among the young population) as vaccine Minister, formed the third topic (Table I).
The least popular theme contained topics related to Conspiracy and humour (11.8%).The first topic is about the control of the population, as for example the conspiracy theory that "the purpose of COVID-19 vaccination was to reduce the global population", whereas the second topic was about effects on the body, as for example that "COVID-19 vaccines are a ploy to connect people to the 5G network".Internet memes formed the third topic, as for example the popular "Vac-vac-cine-cine" (from "vaccine vaccine" because a person needs two vaccine shots to be fully vaccinated and because the combination of these words sounds like "exciting" and "male genitalia" in Japanese) (Table I).

C. Evolution of the Popularity of Themes
Previous research has shown that the number of tweets about a particular topic reflects the users' attention to that topic 33,34 .We thus estimated the popularity of tweets for each topic (grouped in 4 major themes, see previous section) to monitor temporal changes in the interest of users (Fig. 3A).Personal issue (Theme 1) continuously increased, starting at nearly 30% to over 70% by the end of the study period.Breaking news (Theme 2) and Politics (Theme 3), on the other hand, declined steadily respectively from nearly 30% and 25% to less than 10%, dropping more significantly after June, when vaccination became available for people under 65 years old (the majority of Twitter users).Conspiracy and humour (Theme 4) also reduced slightly in the period and overall remained relatively low.We further validated this analysis by creating a subset of keywords for each theme (SI Ap-pendix, Table S3) and then extracting all tweets of each theme from the original data set (24 million tweets) (SI Appendix, Fig. S1A).The linear regression analysis (SI Appendix, Fig. S1B and Table S4) showed a statistically significant increase in the tweets about Personal issue (theme 1) and a decrease in the other themes, with Breaking news (theme 2) and Politics (Theme 3) decreasing five times in comparison to Conspiracy and humour (Theme 4).These trends reveal a shift on the concerns of Tweeter users, who initially shared their attention over personal issue (individual aspect), collecting information from the news (knowledge acquisition), and government decisions (the course of the vaccination campaign), towards focusing mostly on personal issue once the vaccination campaign was effectively implemented on the general population.
The evolution of specific topics reflects finer aspects of the opinion dynamics.The combined topics about personal issues before being vaccinated (i.e.personal view and personal schedule) increased after May followed by a slight decrease after August (Fig. 3C).This pattern reflects increasing concerns with the vaccination and the Tokyo Olympic Games that ended in early August.The combined topics about user's experience after being vaccinated (i.e.live reports, journal, perception, and preparation) showed a sharp increase after June, when the vaccination of the general population began (Fig. 3C).In contrast, conspiracy theories (population control and effect on the body) decreased steadily indicating that education built up confidence in the vaccines (Fig. 3D).Opinions about the booking of vaccination appointments peaked in May, when the booking system was launched.Opinions about politics peaked in April and then decreased substantially reflecting an initial criticism towards the government for the late implementation of the mass vaccination, followed by approval once the campaign rolled out.Again, we validated these findings by extracting the corresponding tweets using a sub-set of keywords for each topic or aggregated topic (SI Appendix, Table S3) and confirmed the trends (SI Appendix, Fig. S2), with a low prevalence of words related to conspiracy theories (5.8%).This result also confirms that the initial concerns about the government and reliability on the vaccines became secondary once the vaccination reached most of the population, and personal experiences became dominant.

D. Shift in Interest at Critical Events
Specific events may have social and individual consequences and affect the public opinion and discussion of different themes.Four critical events marked the vaccination campaign in Japan during 2021, the various stages of the vaccination campaign and the Tokyo Olympic Games (Fig. 2).To test our hypothesis of critical events on the opinion dynamics, we performed an interrupted time series regression 35 to estimate the changes in the popularity of themes (SI Appendix, text for detail).In this analysis, the level parameter indicates a shift in (the relative) attention whereas the slope indicates the rate of popularity of a given theme.Table II shows that Politics (theme 3) was the most affected theme by these events.The impact of the start of the vaccination of the general population and the Tokyo Olympic Games on the public opinion was larger than that of the other critical events, and they affected all aspects of public opinion.The vaccination of health workers positively shifted and accelerated the popularity of the politics theme, likely because of increasing expectations on rolling out mass vaccination.The vaccination of the elderly population only positively shifted the trend.On the other hand, both the vaccination of the general population and the Tokyo Olympic Games negatively shifted the interest on politics, suggesting relative less concerns with government policies.Furthermore, the start of the vaccination of the general population increased the rate of tweets about practical advice, personal experience, and news about the reliability of the vac-cine.Finally, the start of the Tokyo Olympic Games caused a shock of interest in personal issues that remained nearly constant afterwards, likely because the large vaccination coverage achieved during this period (SI Appendix Fig. S3).

E. Spread of Opinions
A tweet is a unidirectional process of sharing information with the community.Retweeting, on the other hand, is a social process where users engage and share tweets to spread opinions on their own social network 36 .The analysis of 75 984 321 retweets by 3 917 181 users (sample 3) showed a higher prevalence of retweets about Personal issue (theme 1) and Politics (theme 3) in comparison to Breaking news (theme 3), and Conspiracy and humour (theme 4) (Fig. 3B).Those observations align with the theory of complex contagion since users mostly engaged with tweets (by retweeting) related to personal experiences and political opinion rather than tweets sharing hard-to-verify information, such as vaccine reliability and conspiracy theories, that might have negative consequences and affect the credibility of the user retweeting 37 .Similar to the popularity of certain topics, the social process is also intensified at certain periods (SI Appendix, Fig. S4).For instance, the topic of booking an appointment exhibits a sharp peak in May, coinciding with the popularity of this topic, whereas the topic of politics declined after April, when the vaccination of the elderly population started.

III. DISCUSSION
The first year of the COVID-19 pandemics was marked by a rush to control spread and develop efficient vaccines.Although most high-income countries pushed to launch mass vaccination as early as possible, the government of Japan was criticized for not reacting timely, particularly given public concerns with the Tokyo Olympic Games that had been postponed to the summer of 2021.Part of the delay was associated with fears that the Japanese population could resist vaccination, given past experiences with HPV vaccines 9 .Twitter provides a platform to monitor in real-time the public debate and engagement in topics of relevance to health and policy, and not least social and economic implications of government decisions 33 .We leveraged the textual information on tweets and performed a topic analysis of 114 357 691 vaccine-related tweets to identify 15 topics further grouped into four major themes: 1) Personal issue, 2) Breaking news, 3) Politics, and 4) Conspiracy and humour during the vaccination campaign in Japan.We found a major shift in public interest, with users splitting their attention to various themes early in the campaign and then focusing on personal issues, as trust in vaccines and policies built up with an effective vaccination campaign.Increased trust helped to reduce the prevalence of tweets about conspiracy and humour, which have negative impact on vaccine uptake 38 .Previous research using social media (Twitter and Reddit) to study the public perception of COVID-19 vaccination in different countries were limited in sample size and did not cover the whole vaccination campaigns.Therefore, only topics related to breaking news [25][26][27] , and politics 25 were identified.We show however that personal issues are common topics and fundamental for an effective campaign due to social support in times of uncertainty.The interrupted time series regression analysis showed that the start of vaccination of the general population and the Tokyo Olympic Games affected public opinion more than other critical events.Public opinion on politics was the most significantly affected debate, positively shifting the attention early in the vaccination campaign and negatively later.In addition, a social dialogue was maintained with tweets about personal issue mostly retweeted when the vaccination reached the adult population, that is the most active user group in Twitter.
The online data set is a convenience sample and thus the study population is limited to those using Twitter in Japan.To minimize potential sampling biases, we re-sampled the original data of 6 million users to remove temporal effects.Unlike standard survey studies, we were unable to collect sociodemographic information and thus could not stratify the analysis to age-group, location, or education and gender 3,7 .Stratification would help us to assess the extent that certain social groups (e.g.adults vs. elderly) and locations (e.g.Tokyo during the Olympic Games) were affected.Furthermore, the inclusion criteria of tweets with the keyword "vaccine" may have captured tweets not relevant to COVID-19, as for example those tweets related to HPV vaccine or pet vaccination.To assess this aspect, we manually reviewed tweets and found that most of them were not contaminated by discussion of other types of vaccines.Our methodology allowed to cluster and monitor the evolution of public opinion that moved from an exploratory phase of gathering information and criticizing the government to social support by sharing personal experi-ences once confidence in the vaccination was established and practical issues became relevant.

A. Data Collection
We downloaded all the tweets written in Japanese including the word "waku-cine" (vaccine in Japanese) posted between January 1, 2021 and October 31, 2021.The data set was provided by the NTT DATA Corporation (https://www.nttdata.com).The study period was chosen to include few weeks before the launch of the vaccination campaign in Japan (February 17, 2021) until few weeks after the end of the Tokyo Olympic Games when the vaccination rate reached 75% of the Japanese population (October 17, 2021).The data set contains 114 357 691 tweets.We extracted all the tweets except "quote tweets" or "mentions" to other tweets.We further collected data of the tweet text, the time stamp (posting time), and whether the tweet was original or a retweet.Using data from Our World in Data (https://ourworldindata.org), we obtained the daily incidence (number of new cases) of COVID-19 and the vaccination rate (the percentage of the population who received at least one dose) of COVID-19 vaccine in Japan 11 .

B. Data Processing
The data processing and analysis were performed using Python software, version 3.9.7 (Python Software Foundation).We first manually identified and removed all the tweets posted by bots, then extracted the plain text from the remaining tweets and removed Emojis.Afterwards, we segmented each text into Japanese words using the morphological analyzer MeCab 39 and removed stop words that have little analytic value, e.g."kore", "sore", "suru", meaning respectively "this", "it", and "do" in Japanese.Finally, we changed words to their root forms, e.g."boku" to "watasi" ("I" in Japanese) or "Utta" to "Utsu" ("inject" in Japanese).This normalization corresponds to e.g."viruses" to "virus" or "went" to "go" in English.

C. Topic Modeling
The statistical Latent Dirichlet Allocation (LDA) model 31 implemented in the Gensim Python package 40 was used to identify topics in the Twitter data.Before the topic modeling analysis, we removed rare words, i.e. words appearing in less than 1 000 tweets that corresponds to 0.0004% of the tweets, and the most frequent words "waku-cine" (vaccine) and "sessyu" (vaccination).To determine the number of topics, we trained LDA models with different numbers of topics and maximized the topic coherence score 41 that is a robust measure of how topics are meaningful (i.e.interpretable) to humans.FIG.S3.Impact of social events on the popularity of the themes.We first calculate the popularity (i.e., the percentage of tweets) of four themes defined by subsets of keywords (Table S3), theme 1: Personal issue (magenta), theme 2: Breaking news (cyan), theme 3: Politics (green), and theme 4: Conspiracy, Humour.We then applied the interrupted time series analysis to the time series of the popularity of each themes for four major events during the vaccination period:   S2) in frequently retweeted tweets.(A) Theme 1: Personal issue, (B) Theme 2: Breaking news, (C) Theme 3: Politics, and (D) Theme 4: Conspiracy, Humour.Note that the topics in theme 1 (A) and 4 (D) are aggregated as in Fig. 3A and D. The topics of Before (After) being vaccinated represent a combined topics of personal view and personal schedule (live reports, journal, perception, and preparation).The topics of Conspiracy represent a combined topics of population control and effect on the body.

FIG. 2 .
FIG. 2.Vaccine related tweets, vaccination rate, and incidence of COVID-19.(A) Number of vaccine-related tweets per day written in Japanese (black, left y-axis) and fraction of the vaccinated population in Japan (magenta, right y-axis) between January 1 and October 31, 2021.(B) Daily incidence of COVID-19 in Japan.Vertical lines indicate four main events during the study period: 1) launch of the COVID-19 vaccination campaign for essential workers; 2) for the elderly population (above 65 years old); and 3) for the general population (under 65 years old); 4) Period of the Tokyo Olympic Games.

FIG. 3 .
FIG. 3. Popularity of the themes or topics.Each line represents the percentage of tweets in each theme or topics over time.The percentage is calculated monthly considering (A), (C), and (D) a sample of the vaccine-related tweets (1 million tweets: sample 2), and (B) only frequently retweeted tweets (i.e.retweeted more than 10 times in a day).
FIG.S3.Impact of social events on the popularity of the themes.We first calculate the popularity (i.e., the percentage of tweets) of four themes defined by subsets of keywords (TableS3), theme 1: Personal issue (magenta), theme 2: Breaking news (cyan), theme 3: Politics (green), and theme 4: Conspiracy, Humour.We then applied the interrupted time series analysis to the time series of the popularity of each themes for four major events during the vaccination period: (A) Vaccination start for health workers, (B) Vaccination start for older people, (C) Vaccination start for general population (under 65), and (D) Start of Olympic games in Tokyo.

TABLE I .
Topics automatically identified on vaccine-related tweets before and during the COVID-19 vaccination campaign in Japan.
"To the idiots in the government: if you can inoculate corona vaccine to all Japanese citizens, you can hold Olympic and Paralympic, but if you can't, cancel them."

TABLE II .
Changes in the popularity of each theme at critical events, i.e. the start of each stage of the vaccination campaign and the Tokyo Olympic Games.Statistically significant changes (p-value < .05)are highlighted.

TABLE S2 .
Top contributing words for each topic identified by the topic model (LDA).