This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
During global health crises such as the COVID-19 pandemic, rapid spread of misinformation on social media has occurred. The misinformation associated with COVID-19 has been analyzed, but little attention has been paid to developing a comprehensive analytical framework to study its spread on social media.
We propose an elaboration likelihood model–based theoretical model to understand the persuasion process of COVID-19–related misinformation on social media.
The proposed model incorporates the central route feature (content feature) and peripheral features (including creator authority, social proof, and emotion). The central-level COVID-19–related misinformation feature includes five topics: medical information, social issues and people’s livelihoods, government response, epidemic spread, and international issues. First, we created a data set of COVID-19 pandemic–related misinformation based on fact-checking sources and a data set of posts that contained this misinformation on real-world social media. Based on the collected posts, we analyzed the dissemination patterns.
Our data set included 11,450 misinformation posts, with medical misinformation as the largest category (n=5359, 46.80%). Moreover, the results suggest that both the least (4660/11,301, 41.24%) and most (2320/11,301, 20.53%) active users are prone to sharing misinformation. Further, posts related to international topics that have the greatest chance of producing a profound and lasting impact on social media exhibited the highest distribution depth (maximum depth=14) and width (maximum width=2355). Additionally, 97.00% (2364/2437) of the spread was characterized by radiation dissemination.
Our proposed model and findings could help to combat the spread of misinformation by detecting suspicious users and identifying propagation characteristics.
As early as February 15, 2020, the General Director of the World Health Organization stated at the Munich Security Conference, “We are not only just fighting an epidemic; but also an infodemic” [
The spread of misinformation on social media can be amplified by information silos and echo chambers with personally tailored content. Kouzy et al [
On social media, misinformation can be defined as messages that aim to persuade other users. Persuasion theories state that the disseminator, message content, and recipient all have an impact on communication. Apart from studying the posts themselves, it is also necessary to examine the users who spread misinformation on social media. To uncover the characteristics of the spreaders of misinformation, we relied on persuasion theories that can help understand how misinformation is spread on social media. According to the elaboration likelihood model (ELM), a widely used persuasion model, users form their attitudes toward a message using either the central or peripheral path [
The dissemination of misleading information leads to increased public uncertainty, lack of belief in trustworthy sources, and, as a result, increased spread and ineffective containment of the virus [
Theoretical model of the spread of COVID-19–related misinformation on social media.
Persuasion can be defined as “human communication that is designed to influence others by modifying their beliefs, values, or attitudes” [
In the peripheral route, messages rely on the emotional involvement of the recipient, and the recipient is persuaded by more superficial means. Cialdini [
Peripheral evidence of social proof is based on the age-old concept of peer pressure [
Bode and Vraga [
Moreover, researchers examined user-based characteristics to further understand the types of individuals who post or spread misinformation on social media [
Existing research on the propagation characteristics of misinformation spread has focused on temporal factors [
In addition to content-based characteristics, the studies show that tweets from unverified accounts contain more misinformation compared to those from verified accounts (31% for unverified accounts, 12.6% for verified accounts;
In summary, there are two potential gaps in the existing literature that we address in this study. Previous studies have examined the characteristics of misinformation about the COVID-19 pandemic from several perspectives [
Previous research on COVID-19–related misinformation on social media.
Study | Title | Method | Data | Source |
Song et al [ |
The South Korean government’s response to combat COVID-19 misinformation: analysis of “Fact and Issue Check” on the Korea Centers for Disease Control and Prevention website | Content analysis | 90 posts | Korea Centers for Disease Control and Prevention (KCDC) website |
Kouzy et al [ |
Coronavirus goes viral: quantifying the COVID19 misinformation epidemic on Twitter | Statistical analysis | 673 tweets | |
Ceron et al [ |
Fake news agenda in the era of COVID-19: identifying trends through fact-checking content | Topic analysis | 5115 tweets | |
Qin [ |
Analysis of the characteristics of health rumors in public health emergencies: Taking the “Shuanghuanglian” incident during the COVID-19 as an example | Case analysis | 134 headings | COVID-19–related rumor list announced by Dingxiangyuan.com |
Chen and Tang [ |
Analysis of circulating characteristics of rumors on Weibo in public emergencies: a case study of COVID-19 epidemic | Coding and visual analysis | 968 posts | Weibo Rumor Refuting |
First, we created a data set of COVID-19 pandemic-related misinformation based on fact-checking sources and then created another data set of circulated posts that contained this misinformation from a real-world social media platform. Based on the collected posts, we further analyzed the dissemination patterns and proposed peripheral-level characteristics of the coronavirus misinformation circulated on social media. The detailed data collection and analysis procedures are described in
Data collection and data analysis process.
Accurate identification of unknown misinformation is difficult for the general public because it requires multidisciplinary expertise. Reliable access to misinformation can be achieved by processing authoritative disconfirming information. For example, from the rebuttal information “Smoking can prevent coronavirus infection. This is false,” we can extract the misinformation “smoking can prevent coronavirus infection.”
As sources of authoritative misinformation, we selected three authoritative online platforms: China Internet Joint Rumor-Refuting Platform [
To collect the circulated posts containing COVID-19 pandemic–related misinformation, we extracted keywords from all the collected misinformation, and then created corresponding queries for an advanced search on the Weibo.cn website. Considering the possibility of delayed and long-term dissemination of misinformation, the query search was limited to original posts between December 1, 2019, and February 2, 2021.
To ensure the accuracy of the collected posts, the first round of collection was performed by manual query using a semiautomated collection tool. If there were more than 50 valid retrieved posts, the second round of collection was performed using an automated web crawler followed by data cleaning. After two collection rounds, Weibo posts containing misinformation were matched with the corresponding misinformation and 11,450 posts were finally identified.
A summary of previous research [
The coding scheme developed in this study is shown in
COVID-19 pandemic–related misinformation topics.
Topic | Illustration | Example |
Government response (Chinese-related) | Information related to traffic control, resumption of work and school, suspension of work and school, epidemic prevention measures, and others | It is said that after the disinfectant powder is sprayed over Wuhan today, patients with fever will be transported to designated hospitals. |
Spread of the epidemic (Chinese-related) | Information related to the spread of the pandemic | The son-in-law of the Guanghan family came back from Wuhan for a few days. The family concealed their working address and went to play cards every day. He became ill today. The neighbors were very angry and went to smash his house. |
Medical information (Chinese-related) | Information related to the virus itself, infection, prevention, treatment, disinfection, and other medical information | A doctor friend sent it. In response to this new type of coronavirus, the content of vitamin C (to fight the virus) and echinacea (to enhance immunity) can be used to prevent it. |
Social issues and livelihood of people (Chinese-related) | Information related to celebrities, donation assistance, social aspects, and people’s livelihood | National level response! All rented houses, apartments, shops and factories will be rent-free for one month in February, and rent-free for half a month in March and April! I hope that all “landlords” will respond positively! Overcome the difficulties together |
International issues | Information related to other countries’ response, online political rumors | Japan sent a 1,000-member medical team to Wuhan without masks and slogans. |
Through the weibo.cn/repost/ website using a Weibo ID, we obtained the specific forwarding, liking, and commenting information of each post. Based on the forwarding relationships, we then created the forwarding network of the collected posts. Following Avram et al [
Features of posts containing COVID-19–related misinformation and users who have posted them.
Category | Description | Data type | |
|
|||
|
Forwards | Frequency of forwarding | Integer |
|
Comments | Frequency of commenting | Integer |
|
Likes | Frequency of liking | Integer |
|
|||
|
Verification status | Verified or not | Verified/Not verified |
|
Verification type | Verification type | Category |
|
Mrank | Weibo membership level | Integer (0-7) |
|
Urank | User level | Integer (0-48) |
|
|||
|
Posts_count | Number of posts | Integer |
|
Followers_count | Number of followers | Integer |
|
Following_count | Number of followings | Integer |
Apart from users who could not be captured because they were, for example, blocked, a total of 11,301 users who had published posts containing misinformation about COVID-19 were collected on Weibo.
Weibo’s user authentication mechanism provides a channel for different types of users to prove their identity. The type of verification includes verified personal users, government users, media users, and businesses. The user level, as the basic characteristic of Weibo users, can largely represent the activity level of accounts. The higher the user level, the more active the user is. Membership level reflects users’ habits in using Weibo. Users with a high membership level can be considered loyal users.
In addition to profile features, the interactive characteristics (ie, numbers of followers, followings, and posts) can also characterize users’ authority on social media. The number of posts reflects the user’s engagement on the social media platform. Users with a considerable number of followers can share their opinions with a large group of people [
Sentiment characteristics have been recognized as effective features for distinguishing online rumors and fake reviews [
To describe the prevalence of coronavirus-related misinformation on social media, in addition to the number of forwards of each post, we crawled the detailed forwarding information for each post in the data set created in the previous step of the research and collected a list of forwards of the original misinformation posts. The forwarding information for each post included the users who forwarded the original post, the content of the reposts, and the forwards and likes that the reposts received. The Weibo platform uses the “//” symbol to divide the forwarded content into different forwarding levels. Therefore, the forwarding level of each post can be extracted based on the forwarded content. In addition, the dissemination network of each post can be constructed using a series of reposts based on the corresponding forwarding relationships. Thus, apart from posts that could not be captured because they were, for example, blocked or deleted, we constructed a dissemination network for a total of 2437 posts that contained information about COVID-19. In these networks, each node represents an individual post, whereas a directed link represents a forwarding relationship from the source node to the repost node. For example, if post A forwards the original post B, then an edge is drawn from nodes B to A.
In the constructed dissemination networks, each node represents a single post that was involved in the spread of misinformation related to COVID-19. Based on the network for each original post, the dissemination scale refers to the number of nodes in the network, corresponding to the number of forwards for the original post. The dissemination depth indicates the highest level of repost in the network of the original post, whereas the dissemination width is equal to the number of nodes at the level with the largest number of nodes in the network.
Illustration of the dissemination scale, depth, width, and speed of a sample post. Each node represents a single post that was involved in the spread of misinformation related to COVID-19.
No ethics approval was required as this study was based on publicly available data and involved no personally identifiable data.
To answer the first research question, coding analysis was performed to identify the content types/topics of posts containing COVID-19–related misinformation. A total of 11,450 such posts were categorized under five topics: government response (n=1021), spread of the epidemic (n=639), medical information (n=5359), social issues and livelihood of people (n=4132), and international issues (n=299). The most common theme was medical misinformation (5359/11,450, 46.80%), including misinformation about the virus, infection, prevention, treatment, and disinfection. The second most popular topic was social issues and livelihood of people (4132/11,450, 36.09%), especially related to fake statements about celebrities. This category also included posts referring to donations that were refuted.
To distinguish different topics of posts, the number of posts and corresponding dates are plotted in
Changes in the number of posts containing misinformation over time.
To answer the first research question, we also examined the social proof features of the collected posts, the sentiment features of the posts, and the authority features of the users who posted them.
The collected posts with COVID-19–related misinformation received 11 forwards, 13 comments, and 189 likes, on average. The pie chart in
Social proof features of posts related to various misinformation topics.
Considering the misinformation topics,
In contrast to other topics, posts spreading misinformation related to spread of the epidemic tended to be consistently negative. In particular, misinformation related to the lifting of the lockdown and traffic restrictions expressed very negative emotions, such as “Harbin is closed! Urgent city closure. No chance for any travel.”
Sentiment features of posts related to various misinformation topics.
Among the users who posted messages containing misinformation related to COVID-19, certified users accounted for 46.60% (5266/11,301). Among them, verified personal users were the most prominent sources (2475/5266, 47.00%), followed by media users (1159/5266, 22.01%) and government accounts (1013/5266, 19.24%). The number of nonverified users was only 6.8% more than that of verified users, representing 53.40% (6035/11,301) of the messages. This suggests that when detecting misinformation, whether it was published by a verified account cannot be a criterion for determining the authority of the information.
We found obvious differences between the authority characteristics of users who posted messages containing misinformation on different topics. For international issues, certified users (160/292, 54.8%) outnumbered uncertified users (132/292, 45.2%), and certified users (3160/5310, 59.51%) also outnumbered uncertified users for medical information. By contrast, misinformation about social issues and people’s livelihoods was more likely to be posted by uncertified users (2586/4093, 63.18%) than by certified users (1507/4093, 36.82%), and misinformation about the spread of the epidemic was also more likely to be posted by uncertified users (409/630, 64.9%).
The user-level and membership-level distributions of users posting misinformation on various topics are shown in the upper part of
The lower part of
In comparison, users who posted misinformation related to international issues and medical information tended to have higher authority than those who posted about the government response, social issues and livelihood of people, and spread of the epidemic. The numbers of posts, followers, and followings of users who posted misinformation related to the government response were the lowest among the five topics, representing users who had less authority.
User-level and membership-level distributions and average interactive features of users posting misinformation about various topics.
Based on the constructed dissemination network for each of the 2437 posts containing COVID-19–related misinformation, we extracted the dissemination scale, depth, maximum width, average width, and speed for each post. In this study, maximum width measured the number of nodes involved in the widest level and average width measured the average number at all levels.
Descriptive statistics of the dissemination patterns.
Dissemination measures | Mean (SD) | Maximum |
Scale | 19.7 (236.03) | 7604 |
Depth | 1.5 (0.99) | 14 |
Maximum width | 20.5 (87.82) | 2355 |
Average width | 15.9 (23.74) | 688 |
Speed | 2.4 (8.20) | 96.9 |
Based on the structure, we divided the dissemination network of the misinformation posts into three main types: (1) radiation dissemination network, where the first-level dissemination is wider than all other levels; (2) sector dissemination network, where the width of the other levels in the dissemination network is wider than that of the first level, and the node with the highest forwarding volume reaps more forwards than likes; and (3) viral dissemination network, where the width of other levels in the dissemination network is larger than that of the first level, and the node with the highest like volume reaps more likes than forwards.
Examination of the posts revealed that 97.00% (2364/2437) were disseminated through the radiation dissemination network, only 0.98% (24/2437) belonged to the sector dissemination network, and 2.01% (49/2437) belonged to the viral dissemination network. As shown in
Confidence interval plots for the dissemination patterns.
Examples of each dissemination network type. (a) Radiation dissemination network. (b) Sector dissemination network. (c) Viral dissemination network.
To further characterize the users who posted misinformation about coronavirus on social media, we leveraged the k-means clustering algorithm to classify users based on user authority features (including user level, membership level, posts count, follower count, and following count). To ensure the quality of the clustering, the Nbclust function in R was used to test different values of k. Based on the elbow method, 5 was selected as the optimal number of clusters.
As determined by the k-means clustering algorithm, users who posted misinformation were classified into five groups: general users, platform users, inactive users, influential users, and minglers. A total of 2342 users were classified as general users, who participate in social media but are less willing to pay for membership. They tended to have a low membership level but a high user level, and their performance in terms of the number of posts and followers was relatively normal. The behavioral patterns of platform users (comprising 2980 users) were similar to those of general users, except for their membership level. They tended to have significantly higher membership levels, indicating that they were both actively participating in social interactions and purchasing memberships to enjoy the privileges. The largest group, inactive users, comprised 5652 users who appeared at lower frequencies for all five features. In contrast, influential users, the smallest group of users (comprising 101 users), posted more frequently than others and also reaped a large number of followers. Users in this group tended to remain in the highest position with respect to both user and membership levels. Finally, users in the mingler group had a higher number of followers than the other user groups, but they made fewer posts. The characteristics of the 226 users in the mingler group were consistent with those identified by Kozinets [
The distribution of different types of users posting misinformation on different topics is shown in
Scatterplot and correlation matrix of user authority features. a: The correlation is significant at a significance level of .001 (two-sided); b: The correlation is significant at a significance level of .01 (two-sided); c: The correlation is significant at a significance level of .05 (two-sided).
Distribution of various types of users posting misinformation related to various topics.
We also performed a correlation analysis to test whether the dissemination network features were significantly related to the authority features of the users who created the posts. The Spearman rank correlation coefficient was used to measure the correlations between the authority features of the creators and the features of the resulting dissemination network.
From a network perspective, messages posted by users with numerous followers tend to receive more attention on social media. Users with high membership levels are more likely to engage in social media interactions. Similar to the context of disaster-related information [
Spearman correlation (
Dissemination variables | Posts count | Followers count | Following count | Membership level | User level | ||||||
|
|||||||||||
|
|
0.114 | 0.344 | 0.009 | 0.171 | 0.107 | |||||
|
<.001 | <.001 | .77 | <.001 | .001 | ||||||
|
|||||||||||
|
|
0.103 | 0.349 | 0.008 | 0.17 | 0.1 | |||||
|
.002 | <.001 | .80 | <.001 | .002 | ||||||
|
|||||||||||
|
|
0.081 | 0.345 | –0.007 | .171 | 0.096 | |||||
|
.01 | <.001 | .83 | <.001 | .003 | ||||||
|
|||||||||||
|
|
0.174 | 0.197 | 0.106 | 0.08 | 0.118 | |||||
|
<.001 | <.001 | .001 | .02 | <.001 | ||||||
|
|||||||||||
|
|
0.023 | 0.174 | –0.003 | 0.105 | 0.047 | |||||
|
.48 | <.001 | .93 | .001 | .16 |
Understanding the underlying psychology of why people fall for misinformation is key to developing effective interventions against it [
Misinformation that attracts attention can trigger intense discussions, thus promoting the spread of information. In addition to the central-level feature, social proofs of posts with COVID-19–related misinformation showed that such misinformation was actively responded to (with an average of 11 forwards, 13 comments, and 189 likes). Interestingly, misinformation related to international issues accounted for 2.61% (299/11,450) of all posts but achieved alarmingly higher attention (with an average of 82 forwards, 67 comments, and 713 likes), suggesting that misinformation involving international issues tends to go viral on social media and thus can have serious consequences. This is consistent with empirical findings on Twitter, where COVID-19–related conspiracy misinformation is most likely to spread [
In contrast to the negative sentiment that emerged among the public during the pandemic [
Analysis of user profile characteristics revealed that users with the lowest and highest levels of user and membership levels were the most responsible for publishing misinformation. Our results suggest that both the least and most active users are prone to sharing misinformation. In contrast to the empirical results of Kouzy et al [
The average number of followers of the misinformation publishers was extremely high (>100,000), indicating the credibility and social influence they possess on social media. Some marketing-oriented accounts changed the main part of genuine news to attract users. As for the medical misinformation, some corporate accounts fabricated misinformation (eg, “Natto can inactivate the virus”) to promote their product.
The average dissemination scale of misinformation posts was 19.7, with an average depth of 1.5 and an average maximum width of 20.5. Li et al [
In capturing the topological attributes of the dissemination network, three main types of networks can be distinguished in the spread of misinformation posts: radiation, sector, and viral. Unlike rumor-spreading on Twitter, in which the news is usually first posted by a low-impact user and then shared by some popular users [
This study has several limitations. First, we only examined misinformation about COVID-19 circulating on Weibo. In addition, we selected “Novel Coronavirus (新冠病毒)/COVID/Epidemic (疫情)” as COVID-19–related keywords. However, due to the potential early inconsistency in disease terminology, users may have used other keywords that were not collected by this study, such as 武汉肺炎 (Wuhan pneumonia) and 不明原因肺炎 (unknown-cause pneumonia), to describe COVID-19–related conversations or topics. Therefore, the characteristics identified in our study may not represent all COVID-19–related misinformation. Future studies should consider misinformation on other social media platforms to ascertain the stability of these findings. Second, we focused on Chinese-language misinformation. Misinformation in other languages about the pandemic could lead to different results, which should also be explored in future work.
In the COVID-19 pandemic, we witnessed a massive infodemic in which fake news and conspiracy theories were spread, especially on social media. This study provides a comprehensive examination of the COVID-19 misinformation spread on a social media platform.
The theoretical contributions of this study lie in the following two aspects. Although efforts have been made to analyze the COVID-19–related misinformation on social media platforms, no comprehensive analytical framework guided by psychological theory exists to study such misinformation, particularly related to COVID-19. Based on the ELM, this work provides a first step toward understanding the underlying persuasion process of COVID-19–related misinformation. By developing a theoretical model of the persuasion process, this study includes a comprehensive set of features to understand the spread of COVID-19–related misinformation on social media. Moreover, whereas previous studies have generally considered the detection of pandemic misinformation as a binary classification problem, our results show that misinformation on different topics appears to have different characteristics in terms of emotion, social engagement metrics, and publisher authority characteristics. Therefore, this study suggests that the development of misinformation detection algorithms and prevention mechanisms should consider the specific topics of misinformation. It is necessary to develop targeted strategies based on the characteristics of misinformation on different topics.
The practical contributions of this study are two-fold. First, although COVID-19–related misinformation has been widely studied, to our knowledge, no research has attempted to uncover the comprehensive characteristics of users who post misinformation about the novel coronavirus on social media. Therefore, this study examined both the profile features and the interactive characteristics of misinformation authors. By revealing the characteristics of misinformation publishers, our results not only extend the research on analyzing COVID-19–related misinformation but also provide a possible solution to the issue of detecting suspicious users who may be prone to posting misinformation. Moreover, the significant positive correlations among the authority features of the users and the topological attributes of the dissemination network indicate the possible influence of authority features on the spread of misinformation. To combat misinformation, our results suggest that it is important for influential users, public organizations, and news media to be aware of their responsibility to provide verified information, especially during a public health crisis.
Table S1. Summary of previous research on the classification of coronavirus-related misinformation on social media.
elaboration likelihood model
This study was supported in part by the National Natural Science Foundation of China (72004091, 72174083) and the Humanity and Social Science Foundation of Ministry of Education of China (20YJC870014).
None declared.