Published on in Vol 23, No 8 (2021): August

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/27681, first published .
Development of a COVID-19 Web Information Transmission Structure Based on a Quadruple Helix Model: Webometric Network Approach Using Bing

Development of a COVID-19 Web Information Transmission Structure Based on a Quadruple Helix Model: Webometric Network Approach Using Bing

Development of a COVID-19 Web Information Transmission Structure Based on a Quadruple Helix Model: Webometric Network Approach Using Bing

Authors of this article:

Yu Peng Zhu 1, 2 Author Orcid Image ;   Han Woo Park 2, 3, 4 Author Orcid Image

Original Paper

1Blockchain Policy Research Center, Cyber Emotions Research Institute, Yeungnam University, Gyeongsan-si, Republic of Korea

2Department of Media and Communication, Yeungnam University, Gyeongsan-si, Republic of Korea

3Interdisciplinary Graduate Programs of Digital Convergence Business, Yeungnam University, Gyeongsan-si, Republic of Korea

4Interdisciplinary Graduate Programs of East Asian Cultural Studies, Yeungnam University, Gyeongsan-si, Republic of Korea

Corresponding Author:

Han Woo Park, PhD

Department of Media and Communication

Yeungnam University

214-1 Dae-dong

Gyeongsan-si

Republic of Korea

Phone: 82 53 810 2275

Email: hanpark@ynu.ac.kr


Background: Developing an understanding of the social structure and phenomenon of pandemic information sources worldwide is immensely significant.

Objective: Based on the quadruple helix model, the aim of this study was to construct and analyze the structure and content of the internet information sources regarding the COVID-19 pandemic, considering time and space. The broader goal was to determine the status and limitations of web information transmission and online communication structure during public health emergencies.

Methods: By sorting the second top-level domain, we divided the structure of network information sources into four levels: government, educational organizations, companies, and nonprofit organizations. We analyzed the structure of information sources and the evolution of information content at each stage using quadruple helix and network analysis methods.

Results: The results of the structural analysis indicated that the online sources of information in Asia were more diverse than those in other regions in February 2020. As the pandemic spread in April, the information sources in non-Asian regions began to diversify, and the information source structure diversified further in July. With the spread of the pandemic, for an increasing number of countries, not only the government authorities of high concern but also commercial and educational organizations began to produce and provide significant amounts of information and advice. Nonprofit organizations also produced information, but to a lesser extent. The impact of the virus spread from the initial public level of the government to many levels within society. After April, the government’s role in the COVID-19 network information was central. The results of the content analysis showed that there was an increased focus on discussion regarding public health–related campaign materials at all stages. The information content changed with the changing stages. In the early stages, the basic situation regarding the virus and its impact on health attracted most of the attention. Later, the content was more focused on prevention. The business and policy environment also changed from the beginning of the pandemic, and the social changes caused by the pandemic became a popular discussion topic.

Conclusions: For public health emergencies, some online and offline information sources may not be sufficient. Diversified institutions must pay attention to public health emergencies and actively respond to multihelical information sources. In terms of published messages, the educational sector plays an important role in public health events. However, educational institutions release less information than governments and businesses. This study proposes that the quadruple helix not only has research significance in the field of scientific cooperation but could also be used to perform effective research regarding web information during crises. This is significant for further development of the quadruple helix model in the medical internet research area.

J Med Internet Res 2021;23(8):e27681

doi:10.2196/27681

Keywords



Background

Since the first reported case of COVID-19 in late 2019, the disease rapidly spread to become a pandemic in March 2020. An infectious disease caused by a pathogen generally spreads to a living host, and it is easily transferable from the infected. As of January 7, 2021, over 87 million people have been infected with the virus in 190 countries and regions worldwide since the outbreak of COVID-19 in February 2020, resulting in a global catastrophe [1].

Social disasters, including infectious diseases, must be controlled through a process of actual data–based analysis [2]. Real-time assessment is critical for disaster monitoring; special attention must be paid to rapid analysis using relevant social and cultural data at both the macro and micro scales. The core of sociocultural data analysis is to understand, identify, and even predict the risk of transmission. Thus, a system needs to be constructed to collect data based on the disaster type or area, and to support the spontaneous decision-making process. Without this system, the public would face an information overload, as they would feel burdened dealing with an enormous amount of information in critical situations [3].

In today’s knowledge-based society, information production goes beyond traditional media organizations and involves many entities. In particular, with the increased popularity of smartphones, the amount of online information produced by individuals and organizations has grown exponentially. As the information production process becomes increasingly complex, traditional media companies face a situation in which the so-called legacy media’s usage time has decreased and is now competing with various sources [4]. Search engines and web portals are catering to a wider range of user needs than traditional alternatives [5]. Low cost is an important reason that internet information channels have more advantages than traditional information channels [6]. This phenomenon is especially obvious when significant events occur. The same is true for the information about COVID-19. Fear, anger, and other emotions also lead people to believe and spread online information available through nontraditional media, regardless of whether it is fake [7]. This implies that not only individual media and informal organizations but also a large number of formal organizations such as governments, academic institutions, and formal public organizations have started using online media to produce and disseminate information. However, sensitivity to fake news is also often influenced by political ideology [8]. In some countries, citizens do not support direct government control of the news, and are more concerned with information sources in cooperation with the media and other nongovernmental organizations [9]. Therefore, web media represent a strong competitor to traditional media in terms of both production and services.

COVID-19 has had a strong impact on the media system [10-12]. People have created significant numbers of online documents by utilizing new media sources such as Facebook and Twitter [13-15]. In fact, governments in some countries have used these forms of new media to build platforms to help combat the virus [16]. For example, due to the COVID-19 outbreak in Wuhan in February 2020, and the shortage of medical resources and services, Weibo, one of the largest new media companies in China, cooperated with the local government to set up a citizen assistance platform on which citizens with real-name identification could ask for help. This new mode of interaction is difficult to create with traditional media. To deal with the COVID-19 pandemic that is currently threatening the world, it is evident that the greater the number of internet information sources, the stronger the social immune system.

Therefore, in this study, we collected information sources with high online presence through big-data techniques in countries with confirmed COVID-19 cases. Additionally, the information related to COVID-19 in three stages (from February to April to July 2020) was analyzed in detail, and the morphology and trend of the web big data in these first 6 months of the pandemic are discussed. From these large-scale online big data, the information dissemination trends of educational institutes, enterprises, and government, and their contents were analyzed. In the case of an emergency or a disaster, this study can systematically explain the multielement spiral structure of the information transmission source in the cyberspace of major countries, which has high academic and social value.

In general, based on the quadruple helix and network analysis method, this study constructed and analyzed the structure and content of internet information sources of COVID-19 considering time and space. The aim was to determine the status and limitations of web information transmission and online communication structure in public health emergencies. Moreover, based on the content revealed, valuable suggestions are proposed to contribute to the internet communication of future public health events.

Online Information Sources

With the rapid growth of the internet, web data analysis (often called “webometrics”) has become important, and its quantitative vastness and content diversity have been increasing accordingly. As mobile phones, tablets, and other mobile terminals have been growing in popularity, people usually use these terminals to obtain information instead of traditional media. Although some mainstream media outlets have their own web feeds, people generally use digital feeds from search engines such as Bing or Google to obtain information. Therefore, the study and analysis of comprehensive network information is often more objective than the study of specific news media, and more comprehensive information can be collated. For example, Thelwall [17] used Wikipedia data collected by Bing to study public interest in astronomy. Park et al [18] used Twitter and YouTube to analyze the spread of the Occupy Wall Street movement. Park and Lim [19] analyzed North Korean propaganda changes using YouTube media data. Cho and Park [16] used network activity information about the agriculture, forestry, and fishery departments to discuss the use of internet innovation by government organizations. This literature indicates that analysis and research on web sources have drawn useful conclusions in many areas to date.

There has also been substantial research regarding online sources of information at the time a disaster occurs. Jung and Park [20] used webometric methods to track and analyze the information networks of various organizations during the Gumi chemical spill in South Korea. They found that the flow of information between agencies had an impact on mobilizing emergency facilities and planning specific emergency responses. Online sources of information can also help alleviate the damage caused by disasters. Allaire [21] studied the Bangkok floods and found that social media users were obtaining real-time updates that could help to reduce their losses. Park [22] analyzed YouTube social activities during the 2016 South Korean earthquake, and found that YouTube became a channel to raise public crisis awareness and promote safety strategies. Kim et al [23] also elaborated on the role of social media in relaying information during disasters. By studying data from online information sources during the 2017 storms in the United States, it was found that the flow of information across the network was controlled by many types of users. Song et al [24] studied the differences and the range of emotions people felt toward local online channels, including publishing boards, Twitter, cafes, blogs, and news, that delivered information related to MERS (Middle East Respiratory Syndrome) [24]. Some scholars also analyzed the network information source data regarding COVID-19. For example, Park et al [25] collected Twitter data and found that monitoring public dialog and rapidly spreading media news can help professionals make complex and rapid decisions. However, web page data in internet information sources are often more stable than those in a social media environment [26]; therefore, we adopted web page data for analysis.

To grasp and respond to the situation of worldwide disasters such as COVID-19, we collected and analyzed the network information source data of countries with a large number of confirmed cases to understand how the disaster-related information provided by multiple sources changes over time. We identified three time periods (February, April, and July 2020), and performed a detailed analysis of the differences in information sources at these times. We also studied the structure of national cyberspace information sources. Through these analyses, we identified the structural evolution of web information publishers and clearly revealed the dynamic changes of information content in each period.

Research Questions

Our primary research questions were as follows: (1) What are the structures and form of the COVID-19 web-mediated network among countries? (2) Are there differences in the keywords and topics of COVID-19–related online information at different stages?


Data Collection

Data were collected using Webometric Analyst 4.1 through Bing, which is one of the most widely used search engines that is available in most countries and regions, including Mainland China. In addition to Google, the Bing search engine is also often used to carry out scientific research [27-29]. Although Google is the world’s largest search engine, it is not available in some regions, including mainland China. Since web page data from China were very important for this study, we used the Bing platform for data collection. Website and domains obtained through the search application programming interface service of Bing were analyzed. In February 2020, the most widely used COVID-19 keyword in the world was “coronavirus.” Therefore, the keyword for the data collected on February 12 was “coronavirus.” After February, the terms “COVID-19” and “2019-nCOV” were also widely used. Therefore, we chose “COVID-19 OR Coronavirus OR 2019-nCOV” for the keyword searches performed on April 17 and July 22, 2020. Data were collected in real time, instead of collecting all information during a certain period. In other words, the data for these three time points (February 12, April 17, and July 22, 2020) are the results of real-time relevant searches on Bing for that day. Instant messages were not limited to the time of publication, and they may contain previously published information that is still highly popular. In addition, instant messages can reflect the actual state of the internet data at that time.

The time of the first collection was in the initial stage of the outbreak, the time of the second collection corresponded to the stage at which the number of new diagnoses had leveled off after spreading worldwide, and the time of the third collection corresponded to the stage when the number of newly diagnosed patients increased sharply as a second wave. For February, we collected data for all 28 countries and regions with confirmed cases, with a total of 9149 data points. For April 17, we collected data from 29 countries and regions with over 7000 confirmed cases and obtained 14,768 data points. For July 22, we collected data of over 70,000 people who received a diagnosis in 30 countries and regions, and obtained 14,483 pieces of data. Our data were collected from the top countries with the highest number of diagnoses per stage, not only from English-speaking countries. To ensure consistency, we only analyzed the English data on the webpage.

Quadruple Helix

Quadruple helix is a research method based on triple helix, which was proposed in 1995 [30]. Researchers studied the development of the knowledge-based economic structure through the spiral relationships among universities, industries, and the government. At first, the triple helix model was used to explain the interaction among academia, government, and industries, and was often used in research related to knowledge production [31]. However, with the development of the triple helix theory, more elements were considered. In 2009, Carayannis and Campbell [32] introduced elements representing the public into the spiral model, such as the civil society and media, thereby forming a quadruple helix model; they added a research method at the level of new technologies and social needs. In 2010, Carayannis and Campbell [33] added the natural environment factor and constructed the quintuple helix model. Based on this factor, the relationship between innovation and sustainable development can be discussed. During cooperation and communication between various organizations, when one kind of organization occupies a dominant position, it can be considered that this organization is separated from the collection of various organizations, and the relationships among different organizations can be studied and explained through the quadruple, quintuple, or n-tuple helix concept [31]. This spiral structure does not always exist only in the academic, government, and industrial dimensions.

We collated the second top-level domain (TLD) data, which were categorized as data from commercial organizations, educational institutes, governments, and nonprofit organizations. A total of 38,399 domains were collected. To better classify the effective levels, we first sorted all of the collected second TLD data. After frequency analysis, the second TLD data that had an occurrence frequency higher than 1% (91/9149; 148/14,767; 145/14,483) were selected for classification. From 15,813 data units, we extracted four levels, namely governments, commercial enterprises, educational institutes, and nonprofit organizations. We used the quadruple helix model to analyze the structural dynamics of the four institutional levels at different periods in detail. For convenience in figures, we abbreviate government domains such as “.gov,” “gob,” and “.go” collectively as “G” (governments); educational domains such as “.edu” and “.ac” as “E” (educational institutes); commercial domains such as “.com” and “.co” as “C” (commercial organizations); and “.org” and “.or” nonprofit domains as “O” (nonprofit organizations).

Network Analysis

A network analysis method was used to analyze the structure of the quadruple helix in detail. Network analysis is a method of quantitative analysis of nodes and connections in a network. When individuals and organizations act as nodes, the connection between them acts as a link. Through the quantitative results of the structure, the characteristics and nature of the network composed of these entities can be analyzed [34]. Network analysis has been widely used in social science research such as in social media use, knowledge dissemination, and organizational cooperation [35,36]. Although nodes have the same properties in a one-mode network, nodes differ in two-mode networks. Thus, in the one-mode network, nodes are the institutional components of a quadruple helix: government, private/business, educational institutions, and nonprofit organizations. By contrast, the two-mode network focuses on the relationship between the analyzed countries and the four institutional types. Centrality indices are important quantitative indices in network analysis, including degree centrality, betweenness centrality, eigenvector centrality, closeness centrality, and others [37-40]. The centrality index used in this study was degree centrality. Degree indicates the direct relationship between the nodes [39,40]. In this study, the degree was mainly used to determine whether the governments, education institutions, nonprofit organizations, and commercial organizations of different countries have similar information, and to observe the helix degree of different countries and the four fields. Although betweenness centrality is an important indicator for evaluating the influence of a mediating effect, the link between countries in this study is common information without mediating phenomena; thus, it was not used for this analysis. In addition, the eigenvector centrality is an index to evaluate the importance of each node connected to other nodes. However, in this study, based on the importance of government, education, public authorities, and companies, it was considered to be less important to evaluate the significance of the connected countries’ eigenvector centrality. Closeness centrality is an indicator of the shortest distance, and was also considered to be of little significance for this study. However, the degree centrality index can judge the strength of direct connections between countries and domains. In other words, the more countries connected in a field, the stronger the influence of this field. Therefore, only degree centrality was used for this analysis.

We used UCINET6 for network analysis and network visualization, including triple helix network analysis and content analysis. In the content analysis, we used the convergence of iterated correlations (CONCOR) method to cluster the semantic network in which words are regarded as nodes and cooccurrence between words forms a tie. CONCOR is a method of performing repeated cross-node correlation analyses to identify the appropriate level of similarity [41]. In other words, we first organized the relationships among words into matrices, thus forming a network of relationships among words. We then calculated the correlation coefficients between the rows and columns in the matrix and carried out the same calculation for the obtained correlation coefficient matrix. After repeated calculations, a correlation coefficient matrix consisting of only 1 and –1 was obtained, which was thus divided into two categories. We then performed the same calculation for both categories again and obtained four different clusters.


The hit counts and domains for each country (or region) are compiled and listed in Tables 1-3 for the three time periods, respectively. We standardized the number of hits and domain names for the three months (February, April, and July), with the maximum value set to 100 and the rest being the ratio of the original value to the maximum value multiplied by 100.

Table 1. Hit counts and domains for February (N=28).
CountryHit countsDomains
Australia100.00100.00
Canada32.4653.23
Italy24.6967.93
United Kingdom23.9333.08
Germany21.0543.98
France18.3033.46
Spain13.6644.61
Belgium10.4138.91
Japan7.2981.62
India5.9933.33
Mainland China5.6147.40
Singapore5.3469.84
Malaysia4.1028.77
United States3.1164.39
Hong Kong (China)2.2668.44
United Arab Emirates2.227.73
South Korea1.6332.07
Taiwan1.2935.61
Philippines1.2929.91
Sweden1.1058.17
Sri Lanka0.8310.65
Vietnam0.7040.18
Finland0.6119.90
Russia0.4247.02
Thailand0.1841.70
Macau0.1614.45
Nepal0.065.83
Cambodia0.027.35
Table 2. Hit counts and domains for April (N=29).
CountryHit countsDomains
Canada100.0081.26
France69.9679.83
England61.9858.84
Germany53.2383.51
Brazil51.7188.14
Italy46.0175.92
US32.7899.88
Sweden31.67100.00
Belgium30.6144.13
Japan30.4261.80
Austria21.4841.04
Netherlands20.9593.24
Spain18.1759.43
Turkey18.1049.23
Chile16.7355.63
India16.3543.06
Switzerland13.9267.50
Denmark13.5084.82
Russia9.1627.05
Portugal9.1354.09
Ireland8.3345.20
Korea6.3946.74
Peru5.8631.55
Mainland China4.1835.11
Poland3.8475.56
Romania3.3149.11
Ecuador2.9541.28
Israel2.6741.64
Iran0.3646.38
Table 3. Hit counts and domains for July (N=30).
CountryHit countsDomains
United Kingdom100.003.18
Canada47.7487.18
France46.7681.88
Mainland China36.3549.18
Brazil33.4092.00
Germany32.224.59
Italy24.5694.24
Argentina15.1181.06
Mexico13.6789.06
Spain11.004.47
South Africa8.4757.65
India7.5284.82
Turkey7.4371.41
United States of America6.353.41
Russia6.2581.76
Colombia4.2068.00
Peru4.0141.18
Sweden3.6183.88
Chile2.8577.53
Ecuador1.8058.59
Indonesia1.53100.00
Pakistan1.0757.65
Philippines1.0261.41
Egypt0.7330.59
Iran0.5758.47
Bangladesh0.4742.47
Kazakhstan0.2644.82
Saudi Arabia0.1532.94
Qatar0.0535.06
Iraq0.0225.41

Table 1 shows that the countries with the highest hit counts in February were Australia, Canada, Italy, the United Kingdom, and Germany. Table 2 shows that the countries with the highest hit counts in April were Canada, France, and the United Kingdom. Table 3 shows that the countries with the highest hit counts in July were the United Kingdom, Canada, and France. In February, there were more domains in Australia, Japan, and Singapore. In April, more domains were observed in Sweden, the United States, and the Netherlands. In July, more domains were found in Indonesia, Italy, and Brazil.

Following these results, we analyzed publishing organizations. We visualized the countries involved in the high-frequency second TLD data as a two-mode network. The data revealed 23 countries in February, 26 countries in April, and 26 countries in July. The visualization results are shown in Figures 1-6. The large “G” in the figures denotes “group.” For example, G(GEOC) means the group in which G (government), E (education), O (nonprofit organizations), and C (commercial organizations) appear simultaneously.

As seen in Figure 1, most of the COVID-19–related messages released in February were from the government, educational, or commercial sectors, with relatively few messages from the nonprofit sector. We divided the countries into several groups based on areas. Among all groups, the countries and regions that received information from these four areas the most included mainland China, Hong Kong, Macao, Australia, and Vietnam. Asian countries accounted for 88% (14/16) of these countries. Information from Italy was primarily from the government and educational enterprises. In Sri Lanka and the United Arab Emirates, information was mainly from the government and educational institutes. In the United States and Russia, information was from the government and the educational and commercial sectors. In the United States, government agencies were primarily concentrated in California. Spain reported more commercial agencies, whereas Belgium reported the most information from the educational field. In general, Asian countries were more diverse regarding the online information shared on COVID-19 in February than other regions. This could be because most of the confirmed COVID-19 cases were diagnosed during this period in Asia, and various regions of the continent were considered to be more sensitive than others [42].

Figure 2 shows the institutional network diagram of the COVID-19–related information released in February 2020. The connection between nodes represents the simultaneous release of COVID-19–related information by these institutions. The width of the connection line represents the frequency of coreleases, and the wider the line, the more simultaneous the releases. The bold line between G and C indicates that the government and commercial area released the most information simultaneously, followed by the government and educational sector, and then the commercial and educational sector. Within the framework of the quadruple helix model, the government and the educational and commercial institutions were the leading producers of COVID-19–related information, and they played a prominent role.

Figure 1. Two-mode quadruple helix structure in February 2020. Large "G" refers to the group. G: government domains; E: educational institute domains; C: commercial domains; O: nonprofit organization domains.
View this figure
Figure 2. One-mode quadruple helix structure in February 2020. G: government domains; E: educational institute domains; C: commercial domains; O: nonprofit organization domains.
View this figure

As seen in Figure 3, April’s COVID-19 messages were primarily from the government and the educational and commercial sectors, and relatively little information was provided by the nonprofit sector, as was the case in February. The number of countries and regions that received information from all four areas simultaneously was the largest. The areas in which COVID-19 information was released in these countries were relatively diverse, including mainland China, the United Kingdom, Brazil, and Japan, with Asian countries accounting for less than half. This is significantly different from the situation in February because, at this stage, COVID-19 became a pandemic and was no longer concentrated in Asia. Information from Romania, Peru, and Chile was primarily from the government and the educational and nonprofit sectors. In Spain, information was mostly from the government and the commercial and nonprofit sectors. Portugal, Italy, India, and Ireland received information primarily from the government and educational sector. Countries where commercial and nonprofit agencies released more information included Israel, the United States, Austria, and Sweden. For South Korea and Poland, government and commercial sectors released more information. In Russia and the Netherlands, most information was shared by government agencies, while in Switzerland and Belgium, educational institutions were the primary sources of COVID-19–related information. In France, the information was primarily shared through the commercial sector. In general, the online information shared about COVID-19 during April and February was quite different in terms of both countries and institutions. Non-Asian countries diversified their fields as COVID-19 became a pandemic.

Figure 4 shows the institutional network diagram of the COVID-19–related information released in April. The government and the educational sector released the most information at this time. The relationship between the government and commercial sector, and that between commercial and nonprofit organizations was also closer. In April, the government and the educational and commercial institutions were still the leading producers of information, playing a prominent role in information dissemination.

Figure 3. Two-mode quadruple helix structure in April 2020. Large "G" refers to the group. G: government domains; E: educational institute domains; C: commercial domains; O: nonprofit organization domains.
View this figure
Figure 4. One-mode quadruple helix structure in April 2020. G: government domains; E: educational institute domains; C: commercial domains; O: nonprofit organization domains.
View this figure

The two-mode diagram of the COVID-19 information release in July shows that most countries were delivering information from a diverse range of sectors (Figure 5). The proportion of individual areas and of countries and regions that received information from only two areas was lower than that in the previous phases. The countries and regions that received information from all four sectors the most included mainland China, Russia, Turkey, and the Philippines. In April, the COVID-19 pandemic continued to spread around the world, and the geographic distribution of information was also seen globally, and not just in Asia. Information from Iraq was primarily from the government and commercial and educational organizations. Information in France was from the government and nonprofit sectors. Government agencies in Chile and Italy provided relatively more information. In the United States, information from the government and educational sector decreased, while information from commercial sectors increased. Less information was collected from Bing in the United States in July. Most regions of the world and many industries were affected by the pandemic in July. The structure of the network for information-publishing organizations also developed from the coexistence of double, triple, and quadruple helices to the main structure of quadruple helices.

Figure 6 shows the institutional network diagram representing the COVID-19–related information released in July. The number of concurrent announcements made by the government and commercial sector remained the highest, followed by the government and educational sector, and then the commercial and educational sectors. In the three stages, the government and the educational and commercial institutions were the leading producers of information and played a prominent role.

We collated the degree centralities in four helices and found that the commercial sector in February had the highest degree, followed by the government and educational sector, and finally the nonprofit organizations (Figure 7). In April, the biggest area of degree centrality was again the commercial sector, followed by the government, nonprofit organizations, and finally educational organizations. In July, the government ranked first, commercial organizations ranked second, educational organizations were third, and the nonprofit sector fourth. Thus, the government and commercial organizations played a significant role in the COVID-19 information network, whereas the role of the nonprofit sector was relatively small.

Figure 5. Two-mode quadruple helix structure in July 2020. Large "G" refers to the group. G: government domains; E: educational institute domains; C: commercial domains; O: nonprofit organization domains.
View this figure
Figure 6. One-mode quadruple helix structure in July 2020. G: government domains; E: educational institute domains; C: commercial domains; O: nonprofit organization domains.
View this figure
Figure 7. Degrees of the three stages. G: government domains; E: educational institute domains; C: commercial domains; O: nonprofit organization domains.
View this figure

We performed a text analysis and CONCOR analysis for the content of the information shared. For the content analysis, we deleted non-English and scrambled characters during data cleaning. There was a total of 9149 documents in February, and 8889 remained after cleaning. There were originally 14,768 documents in April, 14,766 of which remained after cleaning. The number of documents in July was 14,484 and 13,087 remained after cleaning. Word preprocessing was first performed using Python (the Spacy package) and the results were manually collated. We identified the 50 most frequently found words during each of the three months, which are compiled in Table 4.

Table 4 indicates that in February, people paid the most attention to the affected areas (China), health, departments, international, and news. In April, the content was focused on information regarding deaths, health, the pandemic, and public. In July, the content was focused on information regarding the pandemic, health, and news. These remained the top concerns in July 2020, whereas words such as “online,” “service,” “university,” and “government” were also highly ranked at this time. To summarize, the main content in February was dominated by information and news about the outbreak; in April, information was primarily regarding the public and the pandemic; and in July, various online services were used to address the problems caused by the pandemic.

Table 4. The 50 most frequent words of the three stages.
RankFebruaryAprilJuly

WordsStandardized frequencyWordsStandardized frequencyWordsStandardized frequency
1coronavirus100.00coronavirus100.00coronavirus100.00
2country59.30information24.96information36.79
3China30.18die21.70pandemic24.33
4novel28.57health12.09health22.67
5health27.90virus9.96online18.62
6Jan21.14pandemic8.22service17.20
7international19.88public7.64virus14.21
8year19.53update6.88university12.67
9department17.85country6.69government12.24
10news16.35pour6.62news12.12
11world16.00school6.60provide11.88
12school15.91service6.58case11.28
13new15.24meet5.91July10.49
14visit14.22spread5.69new9.98
15statement13.62April5.53development9.73
16spread13.30government5.24education9.46
17patient12.65March5.16ministry9.46
18NSW12.44SARS4.92update9.46
19case10.78China4.91June9.43
20epidemic10.37situation4.86world9.22
21response9.04help4.52country8.98
22Japan8.92case4.51China8.49
23student8.92work4.51support8.49
24find8.81student4.48work8.31
25official8.81new4.47help8.28
26category8.72continue4.36business8.19
27university8.52provide4.36time7.83
28staff8.40novel4.13public7.59
29information7.98outbreak4.13spread7.35
30symptom7.87community4.01March7.29
31child7.43care3.91student7.29
32like6.96support3.91include7.26
33continue6.82online3.88Pakistan7.10
34city6.71people3.75SARS7.04
35public6.67website3.75India6.86
36Feb6.59maatregelen3.67Indonesia6.77
37ministry6.12include3.64community6.71
38disease5.98medidas3.53website6.56
39big5.97time3.52year6.50
40monitor5.95business3.38measure6.47
41pneumonia5.83impact3.23continue6.44
42update5.75man3.11disease6.41
43government5.36find3.08national6.41
44service5.33university3.07people6.35
45unfold5.07page3.06find6.32
46view4.81contact3.02medidas6.32
47Singapore4.63disease2.98home6.26
48cause4.60staff2.94social6.14
49infect4.58late2.93South6.14
50Chinese4.29measure2.92read6.08

To further assimilate valuable information, we constructed the semantic network of high-frequency words and used CONCOR analysis for clustering. Furthermore, we obtained a visualization diagram of the clustering network in the three stages.

Group 1 in Figure 8 is named “Ministry and University,” which contains information on services provided by the government departments, public authorities, and schools. Group 2 is named “Virus Spreading,” which primarily includes confirmed cities, patient symptoms, and the spread of the infection. Group 3 is “Coronavirus and Health” and Group 4 is “Child and Education.” Most of the information in Group 4 is related to “NSW (New South Wales, Australia) education” and “child,” and it showed the highest frequency in February.

Figure 8. Semantic network in February 2020.
View this figure

Figure 9 shows the semantic network for April. Group 1 is a school and student-related group named “Education Issue.” Group 2 is “Virus Spreading,” including information about the outbreak and the spread of the pandemic. Group 3 is “Virus Description,” which contains information related to the characteristics of the virus. Group 4 is “Commercial Issue,” which includes words such as “business,” “government,” “service,” and “help,” and is related to the social change brought about by the pandemic.

Figure 9. Semantic network in April 2020.
View this figure

In July, the semantic network was also divided into four groups, as shown in Figure 10. Group 1 is “Distance Education,” which contains information about online education. Group 2 is “City News,” which contains information about the cities affected by the pandemic. Group 3 contains information about the measures taken, and is therefore named “Measures.” The last group is “Commercial Issues,” which includes information regarding “business,” “services,” “government,” “provide,” “community,” and similar.

Figure 10. Semantic network in July 2020.
View this figure

The clustering topics of the three stages are sorted in Figure 11. The results show that the information in February was mainly regarding the response of the government and educational institutes, such as the impact on schools after the virus began spreading, and the spread of the virus and health issues became prominent topics. In addition, since the spread of COVID-19 was at an early stage in February, information reports in some places were also relatively prominent. By April, the disease had become a pandemic. At this stage, besides information, the spread of the virus, the description of the related characteristics of the virus, and the changes of the commercial environment became prominent topics. By July, the focus in education shifted to distance education. As the pandemic could not be fully controlled within a short period of time, most educational institutions began to prepare for or implement online education. By this time, the public had a basic understanding of the virus and how it spreads, and the focus shifted to measures such as how to deal with this spread. The impact of the pandemic on business and society was still an important topic. Information about cities related to the outbreak also continued to appear in the news.

In general, education became a prominent topic of discussion in all three stages. With time, the basic information regarding the virus and its transmission became popularized, and people began to pay more attention to information about measures to prevent its spread. Since the beginning of the pandemic, the situation has changed in terms of business, government policy, and other public issues. Society has also changed. We compared the results of content analysis with the results of the quadruple helix structure and found that the content analysis also confirmed the form of the quadruple helix structure. In the content analysis, the information groups about business issues and government emerged as relatively large, with a smaller contribution of information about education, although this topic also forms a certain scale of the groups.

Figure 11. Evolution of topic content.
View this figure

Principal Findings

In this study, we analyzed COVID-19 web information sources from a quadruple helix perspective, and found changes in structure and content at each stage during the first 6 months of online information regarding COVID-19. We also found problems in the structure of information sources in the transmission of relevant information. We here provide detailed suggestions, which can contribute to the internet communication of future public health events.

Based on the quadruple helix model, this study collated and analyzed the structure and content of the network information sources about COVID-19 considering time and space. By sorting out the second TLD, we divided the structure of network information sources into four categories: the government, education, companies, and nonprofit organizations. An information source network composed of four levels was obtained. The results of the two-mode quadruple helix analysis of the three stages showed that the major confirmed cases in February (first stage) occurred in Asia, and the online information sources in Asia were more diversified than those in other regions. As the pandemic spread in April (second stage), non-Asian sources of information began to diversify, and in July (third stage), the sources of web information became globalized. Thus, the impact of the first stage of the pandemic was more sensitive in Asia, and the information from various industries was related to responding to this need. However, only some industries in non-Asian regions paid attention in the first stage, and the information source helix did not form, which also led to the slow response to COVID-19 in some regions and the delay in response measures [43]. Since April, the spiral has intensified in non-Asian regions due to the spread of the pandemic to many areas outside Asia, which has raised concern of various industries.

In general, from the results of the two-mode analysis, the structure of the three stages of web information publishing organization has gradually developed from the coexisting structure of a double, triple, and quadruple helix to the diversified structure centered on the triple and quadruple helix. From this phenomenon, we can find that in the face of major public health emergencies, most of the local information release sources are not comprehensive. This phenomenon has also led to the failure of many industries to anticipate and respond to the pandemic in a timely fashion [44,45]. Our results suggest that the health care sector can call on the local information sources of various industries to release appropriate and reasonable information about the health and public events in the future to ensure the timely deployment of all sectors of society and avoid more losses. We used a modular quadruple helix structure to analyze the forces of these four levels at various periods in detail. We found that in February, the information shared was the most coincident and closely linked between government and commercial organizations, followed by educational and government organizations. Next, there was a closeness between the commercial and educational sectors. In April, the government and the educational sector simultaneously released the most information about COVID-19. The relationship between the government and commercial sectors, and the relationship between commercial and nonprofit sectors were also closer. In July, the number of concurrent announcements about COVID-19 by the government and commercial sector remained the highest, followed by the government and educational sector, and then the commercial and educational sectors. We collated the centrality of the three stages and four areas, indicating that the commercial area scored the highest in February, followed by the government and educational sectors, and finally the nonprofit organizations. In April, the biggest area of degree centrality was also the commercial sector, followed by the government, nonprofit sector, and finally educational enterprises. In July, during the third stage, the government played a central role in the COVID-19 information network. In all three stages, as a whole, the government and commercial sector played a significant role in COVID-19 network information, and the connection of the nonprofit sector was relatively low. In fact, in the event of major infectious diseases, school is an important aspect that cannot be ignored, and schools often gather dense populations [46]. The communication role of the education sector as an information source is not stronger than that of business and government sectors. However, as educational institutions learn more than any other institutions about the actual school and education situation, they should take on more of a role than the government and businesses to ensure the spread of information. In future infectious disease health events, education and industry organizations, along with others, need to release information more quickly and accurately.

This study included an analysis of the quadruple helix structure and the content of the three stages using dynamic progressive detailed analysis. We carried out content analysis on 36,742 pieces of information in the three stages. The results of frequency analysis showed that the most prominent information in February was news about the pandemic. April was dominated by information about the public and the pandemic. The focus in July was the use of various online services to solve problems caused by the pandemic. We then used CONCOR cluster analysis to classify the topics in the three stages. The changes in trends in the three stages were also sorted. The results indicated that in the early stages, there were more reports about the affected areas, and the response of authorities such as governments and schools to the virus, and the spread of the virus and health issues were the main points of focus of discussion. The second phase focused on the spread of the virus across the world, which created a global pandemic. At this stage, information about educational hotspots, descriptions of virus-related features, and information about commercial environment changes caused by the pandemic also received attention. In the third stage, the educational hotspots differentiated into the characteristics of distance education. The pandemic made physical face-to-face education difficult. Many educational institutions began to prepare for or implement online education. Public attention at this stage shifted from what the virus was to measures of controlling its spread. In general, education was a prominent topic at all levels. With the change of stage, the information content also changed. In the early stage, the basic situation of the virus and its impact on health attracted most of the attention. Later, the focus was on pandemic prevention measures. The business environment and policy environment have changed from the beginning of the pandemic, and the social changes caused by the pandemic have also become an important discussion topic.

Limitations

Owing to the large amount of data from all countries worldwide, this study has only used the web information for countries with a significant number of diagnosed cases at each stage as the research object. In addition, we only used data from Bing. Although Bing is more widely used than any other search engine in the webometric field, it does not have a strong market share in some parts of the world that rely more on other search engines. For example, Google has the largest market share in the United States, Baidu has the largest share in China, and Naver has the largest share in South Korea. Therefore, the results of different search engines in individual regions may somewhat vary from those of Bing. In addition, there is no ideal description of the web network structure [47]. Search engine properties are considered as more engineering products than mathematical tools [48]. Different search engines often have divergent algorithms and search results, which inevitably produce repeated and mixed results. Since search engines usually consider both quality and efficiency, this could also lead to problems related to Type I and Type II errors, which objectively lead to insufficient coverage [48,49]. These can be considered as limitations of the study.

Conclusions

This study focused on the structure of information sources at each stage of the first 6 months of the COVID-19 pandemic and the development of the network structure through the quadruple helix framework. We found that for public health emergencies, some online and offline information sources were not sufficient. Diversified institutions need to pay attention to public health emergencies, and actively respond to multihelical information sources, which is conducive to implementing a timely and more comprehensive response to public health emergencies. In terms of published messages, the educational sector plays an important role in public health events. However, educational institutions release less information than governments and businesses. In addition, we summarized the trend of COVID-19 online information dissemination. It is important to understand the communicational structure of pandemic information sources worldwide. Currently, the quadruple helix model is primarily used in the field of scientific cooperation in terms of coauthorship analysis, and research in other fields is insufficient. This study highlights that the quadruple helix not only has theoretical significance in the scientific innovation field but can be also used to conduct effective research regarding web information. This is significant for further development of the quadruple helix model with respect to the COVID-19 pandemic.

Acknowledgments

This research is partially funded by the Foundation for Broadcast Culture (방송문화진흥회) in the Republic of Korea. The data for February were analyzed and reported from a different scheme for a Korean-language journal. Since then, the data for April and July have been compiled only for this study, as COVID-19 has rapidly become more serious over time.

Conflicts of Interest

None declared.

  1. COVID-19 Dashboard. Center for Systems Science and Engineering (CSSE), Johns Hopkins University.   URL: https:/​/gisanddata.​maps.arcgis.com/​apps/​opsdashboard/​index.html#/​bda7594740fd40299%0A423467b48e9ecf6%0D [accessed 2021-01-07]
  2. Kim J, Ashihara K. National disaster management system: COVID-19 case in Korea. Int J Environ Res Public Health 2020 Sep 14;17(18):e6691 [FREE Full text] [CrossRef] [Medline]
  3. Horrigan JB. Information overload. Pew Research Center. 2016 Dec 07.   URL: https://www.pewresearch.org/internet/2016/12/07/information-overload [accessed 2021-01-06]
  4. Park HW, Kim JE, Zhu YP. Online information sources of coronavirus using webometric big data. J Korea Acad Ind Coop Soc 2020 Nov 30;21(11):728-739 In Korean. [CrossRef]
  5. Kink N, Hess T. Search engines as substitutes for traditional information sources? An investigation of media choice. Inf Society 2008 Jan 14;24(1):18-29. [CrossRef]
  6. Kitamura S. The relationship between use of the internet and traditional information sources. SAGE Open 2013 May 21;3(2):215824401348969. [CrossRef]
  7. Han J, Cha M, Lee W. Anger contributes to the spread of COVID-19 misinformation. HKS Misinfo Rev 2020 Sep 17;1:1-14. [CrossRef]
  8. Calvillo DP, Ross BJ, Garcia RJB, Smelter TJ, Rutchick AM. Political ideology predicts perceptions of the threat of COVID-19 (and susceptibility to fake news about it). Soc Psychol Person Sci 2020 Jul 22;11(8):1119-1128. [CrossRef]
  9. Marco-Franco JE, Pita-Barros P, Vivas-Orts D, González-de-Julián S, Vivas-Consuelo D. COVID-19, fake news, and vaccines: should regulation be implemented? Int J Environ Res Public Health 2021 Jan 16;18(2):e744 [FREE Full text] [CrossRef] [Medline]
  10. Casero-Ripolles A. Impact of Covid-19 on the media system. Communicative and democratic consequences of news consumption during the outbreak. EPI 2020 Apr 23;29(2):e290223. [CrossRef]
  11. Park HW, Chung SW. Editor's note response to Friedman's "The world before corona and the world after": A perspective raging from the development of civilization to the harmony of east and west, and the paradigm shift. J Contemp East Asia 2020 Dec 31;19(2):169-178. [CrossRef]
  12. Chong M, Park HW. COVID-19 in the Twitterverse, from epidemic to pandemic: information-sharing behavior and Twitter as an information carrier. Scientometrics 2021 Jun 23;126:1-25 [FREE Full text] [CrossRef] [Medline]
  13. Kuchler T, Russel D, Stroebel J. JUE Insight: The geographic spread of COVID-19 correlates with the structure of social networks as measured by Facebook. J Urban Econ 2021 Jan 9:103314. [CrossRef]
  14. Li D, Ko N, Chen Y, Wang P, Chang Y, Yen C, et al. COVID-19-related factors associated with sleep disturbance and suicidal thoughts among the Taiwanese public: a Facebook survey. Int J Environ Res Public Health 2020 Jun 22;17(12):e4479 [FREE Full text] [CrossRef] [Medline]
  15. Xue J, Chen J, Hu R, Chen C, Zheng C, Su Y, et al. Twitter discussions and emotions about the COVID-19 pandemic: machine learning approach. J Med Internet Res 2020 Nov 25;22(11):e20550 [FREE Full text] [CrossRef] [Medline]
  16. Cho SE, Park HW. Government organizations’ innovative use of the Internet: The case of the Twitter activity of South Korea’s Ministry for Food, Agriculture, Forestry and Fisheries. Scientometrics 2011 Sep 28;90(1):9-23. [CrossRef]
  17. Thelwall M. Does astronomy research become too dated for the public? Wikipedia citations to astronomy and astrophysics journal articles 1996-2014. EPI 2016 Nov 14;25(6):893-900. [CrossRef]
  18. Park SJ, Lim YS, Park HW. Comparing Twitter and YouTube networks in information diffusion: The case of the “Occupy Wall Street” movement. Technol Forecast Soc Change 2015 Jun;95:208-217. [CrossRef]
  19. Park HW, Lim YS. Do North Korean social media show signs of change?: An examination of a YouTube channel using qualitative tagging and social network analysis. J Contemp East Asia 2020 Jul 31;19(1):123-143. [CrossRef]
  20. Jung K, Park HW. Tracing interorganizational information networks during emergency response period: A webometric approach to the 2012 Gumi chemical spill in South Korea. Gov Inf Quart 2016 Jan;33(1):133-141. [CrossRef]
  21. Allaire MC. Disaster loss and social media: can online information increase flood resilience? Water Resour Res 2016 Sep 29;52(9):7408-7423. [CrossRef]
  22. Park HW. YouTubers’ networking activities during the 2016 South Korea earthquake. Qual Quant 2017 Mar 27;52(3):1057-1068. [CrossRef]
  23. Kim J, Bae J, Hastak M. Emergency information diffusion on online social media during storm Cindy in U.S. Int J Inf Manag 2018 Jun;40:153-165. [CrossRef]
  24. Song J, Song TM, Seo D, Jin D, Kim JS. Social big data analysis of information spread and perceived infection risk during the 2015 Middle East Respiratory Syndrome outbreak in South Korea. Cyberpsychol Behav Soc Netw 2017 Jan;20(1):22-29. [CrossRef] [Medline]
  25. Park HW, Park S, Chong M. Conversations and medical news frames on Twitter: infodemiological study on COVID-19 in South Korea. J Med Internet Res 2020 May 05;22(5):e18897 [FREE Full text] [CrossRef] [Medline]
  26. O'Neil M, Raissi M, Turner B. The case for asymmetry in online research: Caring about issues in Australian and Canadian Web 1.0 bee networks. Int J Commun 2020;14:5150-5173 [FREE Full text]
  27. Lu Y, Chen Y. Hierarchical cluster and multidimensional scaling analysis of video websites based on URL co-occurrence. 2020 Sep 22 Presented at: 2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS); July 28-30, 2020; Shenyang, China p. 590-594. [CrossRef]
  28. Bianchi G, Bruni R, Daraio C, Laureti Palma A, Perani G, Scalfati F. Exploring the potentialities of automatic extraction of university webometric information. J Data Inf Sci 2020 Nov 21;5(4):43-55. [CrossRef]
  29. Park S, Park HW. Diffusion of cryptocurrencies: web traffic and social network attributes as indicators of cryptocurrency performance. Qual Quant 2019 Jan 25;54(1):297-314. [CrossRef]
  30. Etzkowitz H, Leydesdorff L. The triple helix -- university-industry-government relations: A laboratory for knowledge based economic development. EASST Rev 1995 Jan 1;14(1):14-19 [FREE Full text]
  31. Leydesdorff L. The triple helix, quadruple helix, …, and an n-tuple of helices: explanatory models for analyzing the knowledge-based economy? J Knowl Econ 2011 Jun 18;3(1):25-35. [CrossRef]
  32. Carayannis EG, Campbell DFJ. 'Mode 3' and 'Quadruple Helix': toward a 21st century fractal innovation ecosystem. Int J Technol Manag 2009 Feb 20;46(3/4):201-234. [CrossRef]
  33. Carayannis EG, Campbell DFJ. Triple helix, quadruple helix and quintuple helix and how do knowledge, innovation and the environment relate to each other?: A proposed framework for a Ttrans-Disciplinary analysis of sustainable development and social ecology. Int J Soc Ecol Sustain Dev 2010;1(1):41-69. [CrossRef]
  34. Zhu Y, Park HW. Uncovering blockchain research publications in Asia compared to the rest of the world. Korean Data Anal Soc 2020 Apr 30;22(2):513-526. [CrossRef]
  35. Park HW, Yoon J. Structural characteristics of institutional collaboration in North Korea analyzed through domestic publications. Scientometrics 2019 Mar 9;119(2):771-787. [CrossRef]
  36. Park S, Chung D, Park HW. Analytical framework for evaluating digital diplomacy using network analysis and topic modeling: Comparing South Korea and Japan. Inf Process Manag 2019 Jul;56(4):1468-1483. [CrossRef]
  37. Park HW, Leydesdorff L. Decomposing social and semantic networks in emerging “big data” research. J Informetrics 2013 Jul;7(3):756-765. [CrossRef]
  38. Yoon J, Yang JS, Park HW. Quintuple helix structure of Sino-Korean research collaboration in science. Scientometrics 2017 Jul 27;113(1):61-81. [CrossRef]
  39. Borgatti SP, Everett MG. A graph-theoretic perspective on centrality. Soc Netw 2006 Oct;28(4):466-484. [CrossRef]
  40. Freeman LC. Centrality in social networks conceptual clarification. Soc Netw 1978 Jan;1(3):215-239. [CrossRef]
  41. Chung CJ, Biddix JP, Park HW. Using digital technology to address confirmability and scalability in thematic analysis of participant-provided data. Qual Rep 2020 Sep 11;25(9):3298-3311 [FREE Full text] [CrossRef]
  42. Danowski JA, Park HW. East Asian communication technology use and cultural values. J Contemp East Asia 2020 Jul 31;19(1):43-58. [CrossRef]
  43. Toshkov D, Carroll B, Yesilkagit K. Government capacity, societal trust or party preferences: what accounts for the variety of national policy responses to the COVID-19 pandemic in Europe? J Europ Pub Policy 2021 May 17:1-20. [CrossRef]
  44. Hanage WP, Testa C, Chen JT, Davis L, Pechter E, Seminario P, et al. COVID-19: US federal accountability for entry, spread, and inequities-lessons for the future. Eur J Epidemiol 2020 Nov;35(11):995-1006 [FREE Full text] [CrossRef] [Medline]
  45. Antwi SH, Getty D, Linnane S, Rolston A. COVID-19 water sector responses in Europe: A scoping review of preliminary governmental interventions. Sci Total Environ 2021 Mar 25;762:143068 [FREE Full text] [CrossRef] [Medline]
  46. Gemmetto V, Barrat A, Cattuto C. Mitigation of infectious disease at school: targeted class closure vs school closure. BMC Infect Dis 2014 Dec 31;14:695 [FREE Full text] [CrossRef] [Medline]
  47. Barnett GA, Park HW. Examining the international internet using multiple measures: new methods for measuring the communication base of globalized cyberspace. Qual Quant 2012 Sep 29;48(1):563-575. [CrossRef]
  48. Thelwall M. Extracting accurate and complete results from search engines: Case study windows live. J Am Soc Inf Sci 2007 Jan 01;59(1):38-50. [CrossRef]
  49. Park HW. How do social scientists use link data from search engines to understand internet-based political and electoral communication? Qual Quant 2011 Jan 1;46(2):679-693. [CrossRef]


CONCOR: convergence of iterated correlations
TLD: top-level domain


Edited by C Basch; submitted 02.02.21; peer-reviewed by Q Liu, R Sauvayre; comments to author 30.03.21; revised version received 16.06.21; accepted 10.07.21; published 26.08.21

Copyright

©Yu Peng Zhu, Han Woo Park. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 26.08.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.