Public Engagement and Government Responsiveness in the Communications About COVID-19 During the Early Epidemic Stage in China: Infodemiology Study on Social Media Data

Background Effective risk communication about the outbreak of a newly emerging infectious disease in the early stage is critical for managing public anxiety and promoting behavioral compliance. China has experienced the unprecedented epidemic of the coronavirus disease (COVID-19) in an era when social media has fundamentally transformed information production and consumption patterns. Objective This study examined public engagement and government responsiveness in the communications about COVID-19 during the early epidemic stage based on an analysis of data from Sina Weibo, a major social media platform in China. Methods Weibo data relevant to COVID-19 from December 1, 2019, to January 31, 2020, were retrieved. Engagement data (likes, comments, shares, and followers) of posts from government agency accounts were extracted to evaluate public engagement with government posts online. Content analyses were conducted for a random subset of 644 posts from personal accounts of individuals, and 273 posts from 10 relatively more active government agency accounts and the National Health Commission of China to identify major thematic contents in online discussions. Latent class analysis further explored main content patterns, and chi-square for trend examined how proportions of main content patterns changed by time within the study time frame. Results The public response to COVID-19 seemed to follow the spread of the disease and government actions but was earlier for Weibo than the government. Online users generally had low engagement with posts relevant to COVID-19 from government agency accounts. The common content patterns identified in personal and government posts included sharing epidemic situations; general knowledge of the new disease; and policies, guidelines, and official actions. However, personal posts were more likely to show empathy to affected people (χ21=13.3, P<.001), attribute blame to other individuals or government (χ21=28.9, P<.001), and express worry about the epidemic (χ21=32.1, P<.001), while government posts were more likely to share instrumental support (χ21=32.5, P<.001) and praise people or organizations (χ21=8.7, P=.003). As the epidemic evolved, sharing situation updates (for trend, χ21=19.7, P<.001) and policies, guidelines, and official actions (for trend, χ21=15.3, P<.001) became less frequent in personal posts but remained stable or increased significantly in government posts. Moreover, as the epidemic evolved, showing empathy and attributing blame (for trend, χ21=25.3, P<.001) became more frequent in personal posts, corresponding to a slight increase in sharing instrumental support, praising, and empathizing in government posts (for trend, χ21=9.0, P=.003). Conclusions The government should closely monitor social media data to improve the timing of communications about an epidemic. As the epidemic evolves, merely sharing situation updates and policies may be insufficient to capture public interest in the messages. The government may adopt a more empathic communication style as more people are affected by the disease to address public concerns.


Introduction
Background On December 31, 2019, a cluster of pneumonia cases of unknown etiology were first reported in Wuhan, the capital city of Hubei Province, China [1]. The causative pathogen was soon identified as a novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [2] and the disease caused by SARS-CoV-2 was termed the coronavirus disease  [3]. Epidemiological investigation of the first 41 laboratory-confirmed human cases revealed that most had a history of visiting a seafood wholesale market (Huanan Market) in Wuhan where live wild animals were also sold for human consumption [4], but later, human-to-human transmission was confirmed, as cases without a history of visiting the market increased dramatically [5][6][7]. The outbreak of COVID-19 in Wuhan rapidly evolved into a severe pneumonia epidemic nationally and later worldwide. As of May 13, 2020, a total of 84,458 confirmed human cases of COVID-19 including 4644 deaths in China were reported to the World Health Organization [8]. Worldwide, COVID-19 has been a pandemic affecting more than 200 countries or territories with a total of 4,170,424 human cases of COVID-19 including 287,399 deaths as of May 13, 2020.

Communication About Outbreaks
The outbreak of a newly emerging respiratory infectious disease usually puts individuals at a high risk of infection and constitutes a highly uncertain situation that changes rapidly, threatening serious potential loss and prompting considerable psychological distress [9,10]. Feelings of uncertainty provoke great public anxiety, which if not properly addressed can develop into public panic and herd behaviors that may harm social order and population health [11,12]. Effective outbreak communication, particularly at the early stage, becomes critically important for dealing with excessive public fear, promoting risk awareness, empowering the public in taking protective actions, and gaining public confidence and trust [11,13,14]. Conceptual models for guiding outbreak communications have been developed since the 2003 outbreak of severe acute respiratory syndrome (SARS) [14][15][16][17] and used in empirical research for examining communication practices during the 2009 influenza A/H1N1 pandemic [14], the 2013-2016 Ebola outbreak in West Africa [18], and most recently the outbreak of COVID-19 [19]. These communication models and empirical research indicate that effective outbreak communication should be prompt and transparent, dynamic as the situation evolves to meet changes in public needs, relevant and able to engage the community, and empathic and caring to address public emotional distress [14][15][16][17][18][19].

Social Media as a Platform for Outbreak Communication
The high penetration of internet use and rapid development of information and communication technologies have made the internet an increasingly important health information source worldwide [20][21][22][23]. Reading, commenting, sharing, and seeking health information from social media, particularly through a mobile device, has become an increasingly important pattern of health information consumption in China [20,24,25]. During an epidemic, social media can facilitate the spread of epidemic awareness, attitudes toward control and preventive measures, emotional responses and behaviors, as well as misinformation and rumors in the public through online interactivity [26][27][28][29][30]. As the epidemic evolves, this may facilitate homogeneous mental representations of the epidemic, leading to collective behavioral responses [31]. In China, Sina Weibo (the Chinese version of Twitter) is one of the most popular platforms that attracted 486 million active monthly users in 2019 [32], most of whom accessed their user accounts through a mobile device [24]. Its microblogging function allows users to create and share short messages in a multimedia format, and other users can "share," "like," "comment," and "follow" the initial posts. Numerous government agencies in China also make use of Weibo to communicate with the public. As of June 2019, there were a total of 139,270 verified government microblogs in Weibo [24].
With the proliferation of internet and social media used as health information sources, infodemiology (or infoveillance), the study (or surveillance) of "distribution and determinants of information," in the internet or a population for the aim of guiding policy making and public health interventions [33] has been commonly used in the case of infectious disease outbreaks. Among various applications of infodemiology or infoveillance methods for social media data about infectious disease outbreaks, tracking information (concept) prevalence data [27,[34][35][36][37][38] and qualitatively analyzing and categorizing contents of the social media data [19,27,[36][37][38][39][40][41] are believed to be able to provide important insights into outbreak communications. However, existing studies mainly focused on tracking specific concepts or information such as blame [27], misinformation [34], stigma [30], specific keywords and sentiment [35], and organization trust and managing uncertainty [39], possibly due to specific research interests or using machine-aided analysis which does not allow flexible content analysis [42]. Moreover, existing studies mainly focused on one side of the outbreak communication, either the public response or response of health authorities [19,27,[34][35][36][37][38][39][40][41]. This only provides partial understanding about the interactivity between the public and health authorities, whereas two-way communication is believed to be crucial for effective outbreak communication [40]. Although a recent infoveillance study on Weibo data about COVID-19 provided some qualitative descriptions about the potential interactions between the public and government online about COVID-19 by time, the study did not quantify public engagement and government responsiveness regarding COVID-19 and how these changed by time as the pandemic unfolded [38]. In addition, this study's qualitative results seemed to lack a clear structure for understanding public perceptions and emotions [38]. The study conducted by Chew and Eysenbach [37] provides more comprehensive methods for guiding coding of social media data related to discussion topics, emotions, and online behaviors as well as quantifying and tracking the changes of these contents as the epidemic unfolds.

Study Objectives
The outbreak of COVID-19 was an unprecedented epidemic that China experienced for the first time in this digital era. Based on this study's literature review on important principles of effective outbreak communication and the knowledge gaps of current infodemiology or infoveillance studies examining outbreak communication using social media data, this study aimed to make use of the Weibo data about COVID-19 from December 2019 to January 2020 in China to answer the following research questions: 1. How did the public respond to the outbreak of COVID-19 as cases of COVID-19 increased and increasingly stringent containment measures were implemented, and how quickly could the government respond to public discussions about COVID-19? 2. To what extent could government messages about COVID-19 engage the general online users? 3. What contents did the public discuss online, and how did these contents change as the epidemic evolved? To what extent could the government respond to the temporal change of public discussions online?

Data Extraction
Four keywords in Chinese characters were used to capture data relevant to COVID-19 from Weibo from December 1, 2019, to January 31, 2020: "Wuhan pneumonia," "novel coronavirus," "novel coronavirus pneumonia," and "novel pneumonia." Data were retrieved using the built-in Weibo searching function and were subsequently screened by the Python Web Crawler, a tool that has been demonstrated to be efficient in identifying the most relevant posts that contain the set keywords [43]. A total of 1,028,204 relevant posts were initially retrieved. We also tried the keywords "unknown pneumonia" and "SARS" in Chinese characters to capture Weibo data from December 1, 2019, to January 9, 2020, when the etiology of COVID- 19 was not yet confirmed [2]. Since "Wuhan pneumonia" is a less specific term for COVID-19 before official announcement of unknown pneumonia in Wuhan on January 9, 2020, we manually checked the relevant posts by January 9, 2020, in the database.

Engagement Analysis
This study evaluated how much government posts can engage online users in the communications about COVID-19 by calculating the engagement index using engagement data of the posts delivered by government agency accounts [44,45]. The engagement data comprising likes, shares, and comments on the posts from each government account and the number of followers of these accounts were first extracted. These engagement data were then used to calculate the three metrics of social media engagement: popularity (likes per post and per 1000 followers), commitment (comments per post and per 1000 followers), and virality (shares per post and per 1000 followers); all three metrics were subsequently aggregated to generate the overall engagement index. Based on the engagement index, we identified the top five most active government agencies in both the health and nonhealth sectors. When ranking engagement, we specifically excluded any government account that delivered fewer than the average number of posts generated by all government accounts combined during the study period because these government accounts could rank high based on engagement index but were considered inactive in interacting with online audiences. We used mean but not the median number of posts as the cut-off because over 50% of government agencies only had 1 post during the study period. In addition, since the National Health Commission (NHC) of China is regarded as the lead agency in coordinating the national effort to combat the COVID-19 outbreak in China, its engagement data were included in comparison with other government agencies despite that its engagement index was not ranked in the top five.

Content Analysis
There is currently no consensus about how to best sample social media data for content analysis [42], but random sampling has been commonly used and seems to be suitable for social media data [46]. However, for random sampling, sample sizes differ a lot across studies due to different study purposes, duration of study period, resources, and whether data were coded automatically or manually [42]. Manual coding can generate richer information by accommodating new codes emerging during data analysis, but the sample size must be kept at a manageable level to avoid fatigue and improve accuracy in coding. Since we were interested in temporal changes in the discussion contents about COVID-19 among general online users, we focused on the personal account posts for content analysis. We first excluded personal account posts that lacked all engagement (no likes, comments, and shares) because these posts may have captured little attention and interest of other online users. Thereafter, 20 posts per calendar day between December 31, 2019, and January 31, 2020, were randomly selected for content analysis. In addition, 4 posts delivered by personal accounts (1 on December 29, 2019, and 3 on December 30, 2019) before the first official announcement of the unknown pneumonia cases in Wuhan on December 31, 2019, were included for content analysis. Therefore, a total of 644 personal account posts were finally included for content analysis. Interrater reliability was assessed by calculating Cohen kappa, with a value of 0.6 or above indicating adequate reliability. The set of codes finally generated were then constant compared to develop thematic categories.

Statistical Analysis
A Pearson chi-square test was used to compare the overall percentages of each thematic category between personal account posts and government agency posts. Latent class analysis (LCA) was used to explore main patterns of post contents by type of account. To conduct the LCA, we first generated a binary-coded variable (1=the specific thematic category is present and 0=the specific thematic category is absent) for each thematic category coded for each post. These variables were then entered into Mplus 7.3 (Muthén and Muthén) for LCA. Major fit indices provided by Mplus including Akaike information criterion (AIC), Bayesian information criterion (BIC), sample size adjusted BIC (aBIC), and the entropy value were used to determine the optimal number of class for the post contents of both personal accounts and government agency accounts. A model with smaller values of AIC, BIC, and aBIC but with a higher entropy value is preferred, but we also consider the interpretability of the model and model parsimony [47]. The LCA would finally determine the content pattern (ie, latent class) of each post. A chi-square for trend was then used to examine the temporal change of the proportion of each content pattern by week by type of account after checking the linearity of the distribution of proportion of each content pattern by week.
The temporal trend analysis excluded the 4 personal posts delivered before December 31, 2019. The LCA was conducted using Mplus 7.3, and other statistical analyses were conducted using STATA 15.1 for Windows (StataCorp LLC).

Weibo Activity
We detected information prevalence in relation to the outbreak by plotting daily numbers of Weibo posts by type of accounts with daily numbers of newly confirmed cases of COVID-19 in Figure 1. The Weibo "top search enquires" automatically identified by the built-in function of Weibo based on the number of relevant posts on that day were also marked on observable peak days of Weibo activity (Figure 1). There was a peak on December 31, 2019, when a cluster of unknown pneumonia cases in Wuhan were officially announced for the first time.
Another small peak was detected on January 9, 2020, when there was a hot discussion about naming the etiology of unknown pneumonia in Wuhan as the novel coronavirus. Weibo activity increased dramatically starting from January 18, 2020, when daily new confirmed cases of COVID-19 substantially increased. For personal account posts, a third peak was found on January 22, 2020, when the Wuhan government announced a policy of mandatory wearing of face masks in public places and human-to-human transmission of COVID-19 was announced by an expert who had just visited Wuhan for investigations, and the last peak within our study timeframe was detected on January 24, 2020, within 24 hours after Wuhan city was locked down. For posts from government agencies and media and commercial agencies, the third peak was not observed, and the last peak within the study time frame occurred 2 days later.
Overall, it appeared that public online reactions followed the spread of the disease and government actions, and responded earlier than the government.

Individual and Government Post Contents
Frequencies and proportions of thematic categories of post contents for personal and government accounts are shown in Table 2, and detailed descriptions of the thematic categories identified in our study can be found in Multimedia Appendix 1.
We noted 3 cyber-support behaviors from personal account posts: sharing knowledge or information, emotional exchange, and seeking information of which only the first 2 were identified in government posts. As is shown in Table 2, for sharing knowledge and information in both groups, the most common thematic categories were situation updates of COVID-19 followed by general knowledge about coronavirus pneumonia and advice on preventive measures. Government agency posts were more likely to share information about policies, guidelines, and official actions (χ 2 1 =14.5, P<.001), and instrumental support (χ 2 1 =32.5, P<.001), while personal account posts were more likely to share information on public responses to the epidemic (χ 2 1 =19.1, P<.001). Personal account posts were more likely to be classified as emotional exchange (χ 2 1 =30.5, P<.001) including showing empathy to affected people (χ 2 1 =13.3, P<.001), attributing blame to people or organizations for malpractice during the epidemic (χ 2 1 =28.9, P<.001), and expressing worry about the epidemic (χ 2 1 =32.1, P<.001). The government posts more likely praised people or organizations (χ 2 1 =8.7, P=.003). The main groups of people praised by both groups were health care workers, while the main people or organizations being blamed in personal account posts included other individuals (eg, individuals who consumed wild animals, breached the infection containment measures, and committed medical violence) and the government (individual government officers or government in general). Regarding seeking information, the main information sought was updates about the epidemic situation.  c Cell with expected frequency less than 5 and thereby P values from the chi-square test were not available.
Compared with post contents of government agencies from the health sector, we found that the four government agencies from the system of public security bureaus were more likely to share information about situation updates (χ 2 2 =15.9, P<.001), while the Hubei Red Cross Society of China were more likely to post contents about instrumental support (eg, donation of materials or money; χ 2 2 =25.3, P<.001) and showing empathy to affected people (χ 2 1 =25.7, P<.001).

Content Patterns and Temporal Changes
The The temporal change of each content pattern by type of account are shown in Figure 2. The chi-square for trend analyses found that for personal posts, proportions of posts sharing situation updates increased from week 1 (December 31, 2019-January 6, 2020) to week 3 (January 14-20, 2020; for trend, χ 2 1 =28.6, P<.001) but declined thereafter (for trend, χ 2 1 =19.7, P<.001). Sharing general knowledge (for trend, χ 2 1 =15.3, P<.001) and policies, guidelines, and official actions (for trend, χ 2 1 =15.3, P<.001) declined each week, while showing empathy and blaming increased significantly in later weeks (for trend, χ 2 1 =25.3, P<.001), and worry about the epidemic remained stable and low. For government agency account posts, sharing situation updates and prevention tips, and fighting against rumors remained stable; sharing general knowledge declined each week (for trend, χ 2 1 =14.7, P<.001), while sharing policies, guidelines, and official actions (for trend, χ 2 1 =8.9, P=.003), and providing instrumental and emotional empathy and praising people (for trend, χ 2 1 =9.0, P=.003) increased significantly on the last 2 weeks of the study time frame. We also specifically explored how providing reassurance, a main emotional content in government posts, changed as the epidemic evolved. We found that providing reassurance in government posts was more frequent in the first 3 weeks but declined in the last 2 weeks of the study time frame (for trend,

Principal Findings
The Weibo activity in the early stage of the COVID-19 epidemic showed that public reactions seemed to follow the spread of the disease and government actions in China. This again evidences the value of infodemiology or infoveillance studies to understand public response to the disease and government actions during the early epidemic stage despite a concern over the censorship of online information for propaganda purposes in China. There was 1 post blaming people who consumed wild animals for causing "Wuhan pneumonia" that was captured on December 29, 2019. On December 30, 2019, 3 more posts were identified: 2 seeking confirmation about unknown pneumonia cases detected in Hunan Seafood Market in Wuhan and 1 sharing information about unknown pneumonia cases found in the market. All 4 posts were from accounts of individuals who lived in Wuhan. This indicates that before the official announcement of the first cluster of unknown pneumonia in Wuhan on December 31, 2019 [1], relevant information had been spread in the public through interpersonal communication, social media, or other channels. The first government post was delivered on December 31, 2019, 2 days after the first individual post, indicating that, although the Chinese government's outbreak communication has been more timely and transparent compared with their response to the 2003 SARS outbreak [48], more efforts are required to improve early outbreak communication when uncertainty usually challenges communication. Early posts should be treated as alarms and responded to in a timely manner rather than being silenced, which could receive harsh criticism and worsen the epidemic control [49].
Our study evidences the use of social media among the Chinese government agencies in the communications about COVID-19 at the early epidemic stage, but the communication remained limited. The engagement analysis indicates that the public generally had low engagement with government agency posts, even those from the most active government agencies. In China, the NHC is expected to play a leading role in risk communication during an epidemic, while provincial and municipal health commissions are expected to report duties and provide epidemic information of their own provinces and cities, respectively, to the NHC. However, although the NHC and provincial health commissions have attracted a large number of followers on social media during the epidemic, compared with municipal health commissions, messages from these high-level health authorities were more disengaged by the general public. The generally low values of the engagement metrics popularity (likes per post per 1000 followers), commitments (comments per post per 1000 followers), and virality (shares per post per 1000 followers) may indicate low levels of interest, use, emotional arousal, or even credibility of the government information among the online audiences [50,51]. This may partly reflect a population inertia effect, where it takes a certain amount of time, or threat, before there is a noticeable change in the bulk practices of a population. The duration of such a period of inertia would be valuable to know.
The content analysis suggests that the Chinese government agencies mainly used social media to "inform" the public about updates of the epidemic situation; knowledge of the coronavirus pneumonia; policies, guidelines, and government actions; and prevention tips, all being the key risk messages included in the official websites of health authorities for communicating about an epidemic [40]. This suggests that government agencies mainly adopt a top-down approach in risk communication and use the social media for one-way communication. The temporal changes of content patterns of personal posts indicate that the public have less interest in situation updates, general knowledge, policies, and guidelines as the epidemic evolves. However, the public may feel more empathic with the affected people and angry about other individuals or the government who put people at risk, as an increasing number of people are affected by the disease and the control measures. The government seemed to continue frequently sharing situation updates; policies, guidelines, and official actions; and prevention tips despite a decline in public interest. However, we also observed a significant increase in the frequency of instrumental and emotional support in government posts as the epidemic evolved. This provides in-depth understanding about why sentiment analysis indicates a decline in negative sentiment but increases in positive sentiment as the epidemic unfolded [35]. The inconsistent temporal changes in content patterns between personal and government posts and insufficient emotional support of government posts indicates an overall inadequate government responsiveness to public concerns. Reassuring the public about the epidemic risk was one of the main emotional contents identified from the government posts and was particularly apparent in the first 2 weeks; even Weibo data indicated a generally low risk awareness and concern among the general public. This is consistent with the public response to the 2009 influenza A/H1N1 pandemic shown by Twitter data [37]. Our study indicates that this may be because the government overreassured the public at the early epidemic stage, which is against Sandman and Lanard's [52] guidelines that risk communication should lean toward the alarming side particularly when the situation is uncertain. However, reassurance should be provided as more people are infected to deal with excessive or prolonged fear. The increasing use of blame in personal posts as more people were infected coincides with Douglas' [53] idea that contemporary risk is highly politicized. As more people are affected, "whose fault?" becomes a primary question to seek the accountability of certain persons or organizations and make sense of the epidemic [27].
Our study found that the public blamed not only individuals who put others at greater risk during the epidemic but also government, particularly local government figures for their perceived failures in risk communication and control measures. This also aligns with Beck's works on global risk society, in which authorities are increasingly questioned and blamed for failing to protect individuals [54]. Current data reveals seldom use of conspiracy theories in the attribution of blame and blame as a way to spread rumors [55]. In contrast, the government agencies tend to praise people who had made contributions to the control of the epidemic. This is viewed as a blame-avoidance strategy, called heroization [55], to direct public outrage to the appreciation of another group of people. Working with heroes in outbreak communication may be critical to improving communication effectiveness.

Limitations
First, individual posts were sampled by randomly choosing equal numbers of posts per calendar day for content analysis rather than using the common probability-based random sampling, which would generate a large sample size due to the vast amount of social media data. However, there is currently no consensus about how to best sample social media data, and our sampling strategy was able to draw a random manageable sample using manual coding. Second, only posts of government agencies that were more actively engaged with the public in online communication about COVID-19 were included for content analysis. This means that the sample of government posts for content analysis may not be representative of all government posts. However, this sample was considered to have greater impact on online audience's knowledge, perceptions, attitudes, and behaviors due to greater audience interest and attention to their messages. Third, our study did not evaluate the responses to and concerns over COVID-19 among nonnetizens, particularly those living in rural areas and older people in China [24], and the Chinese government's communications about COVID-19 through other channels. In addition, our data only covered the first 5 weeks after unknown pneumonia cases in Wuhan were first officially announced, which is a relatively short but critical period for outbreak communication. Furthermore, due to the problem of censorship on Weibo data in China, our data based on keyword extraction may lose a considerate part of Weibo data, particularly those posted before the official announcement of the outbreak in Wuhan. This means that our data may not be able to accurately assess the delay of government responsiveness to the threat online.

Implications
The governments can closely monitor the social media discussions to identify public concerns to further improve government responsiveness in outbreak communication. To improve government responsiveness and public engagement, first, persons who have received training in risk communication can be designated for monitoring public concerns online, delivering timely messages, communicating about the uncertainty, and making efforts to address public concerns online. Second, the municipal health commission can communicate more locally relevant information to attract local followers' interest and motivate their information sharing. Third, the provincial and national health commissions can organize direct dialogues with online audiences on social media (eg, Weibo Chats) with trustworthy health care workers to capture audience's attentions and interests, and facilitate the rapid spread of fact-related messages [39]. Although the main role of the NHC in risk communication may be to disseminate facts, increasing messages showing empathy and care to affected people as the epidemic evolves is believed to be essential for maintaining credibility and trust in the public during a crisis [14].

Conclusion
The public seemed to respond earlier to the outbreak of COVID-19 online than government agencies. TThe Chinese government agencies' use of social media for outbreak communications remained limited to providing knowledge and information to the public. As the epidemic evolved, the public had declining interest in fact-related messages but became more empathic with the affected people and tended to attribute blame to other individuals or the government. The tendency of increasingly attributing blame to other individuals or the government may push the Chinese government to seek accountability and refine the compensation system for affected people. As more people are affected, the government may adopt a more empathic communication style to address public emotional response.

Conflicts of Interest
None declared.

Multimedia Appendix 1
Descriptions of thematic categories of post contents.

Multimedia Appendix 2
Comparing model fit indices of latent class analysis models with different numbers of latent class by personal accounts and government agency accounts.