Mining the Characteristics of COVID-19 Patients in China: Analysis of Social Media Posts

Background: In December 2019, pneumonia cases of unknown origin were reported in Wuhan City, Hubei Province, China. Identified as the coronavirus disease (COVID-19), the number of cases grew rapidly by human-to-human transmission in Wuhan. Social media, especially Sina Weibo (a major Chinese microblogging social media site), has become an important platform for the public to obtain information and seek help. Objective: This study aims to analyze the characteristics of suspected or laboratory-confirmed COVID-19 patients who asked for help on Sina Weibo. Methods: We conducted data mining on Sina Weibo and extracted the data of 485 patients who presented with clinical symptoms and imaging descriptions of suspected or laboratory-confirmed cases of COVID-19. In total, 9878 posts seeking help on Sina Weibo


Background
In December 2019, pneumonia cases of unknown origin were reported in Wuhan City, Hubei Province, China. The illness was identified and officially named as coronavirus disease 2019 , which is caused by a novel viral strain called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1][2][3] and resembles severe acute respiratory syndrome coronavirus (SARS-CoV) [4]. Since the outbreak, COVID-19 has spread rapidly. Person-to-person transmission in hospital and family settings had occurred due to close contact [5,6]. On January 23, 2020, Wuhan shut down public transportation and was placed under lockdown, and residents were not allowed to leave the city. As of February 20, 2020, the accumulative number of laboratory-confirmed patients in Wuhan was 45,346. The health care system was further overburdened as patients with mild symptoms sought hospitalization instead of self-isolation, mainly due to the anxiety and panic instigated by the epidemic [7]. After failing to be admitted to a hospital, patients sought help on Sina Weibo, a Chinese microblogging site similar to Twitter that allows people to communicate and share information instantly [8]. Social media has become an important channel for promoting risk communication during the crisis [9,10] and can be used to measure public attention given to public health emergencies [11], such as H7N9 [12][13][14], Ebola [9,[15][16][17][18][19], Zika virus [10,20,21], Middle East respiratory syndrome (MERS-CoV) [22], and Dengue fever [23].
Since the COVID-19 outbreak, social media, especially Sina Weibo, has become an important platform for the public to obtain epidemic-related information quickly and effectively. According to the official outbreak data released by Sina Weibo on February 26, 2020, 51.2 million users cumulatively posted 350 million pieces of epidemic-related content. Online readership of epidemic-related topics reached 754.5 billion. Sina Weibo established a communication channel that allowed the government to effectively listen and respond to public opinion quickly. Here, by collecting data from Sina Weibo from February 3 to 20, 2020, we aim to analyze the characteristics of suspected or laboratory-confirmed patients with the SARS-CoV-2 infection.

Objective
In this study, we describe the characteristics of suspected or laboratory-confirmed patients with the SARS-CoV-2 infection, the distribution of patients throughout Wuhan, and the relationship between helpers (eg, relative, friend, spouse, sibling) and patients. Social media was used to obtain timely access to public demand so that the government and the health department could identify high-risk patients and take measures to help these patients.

Overview
Sina Weibo launched a platform to provide online help channels for patients infected with SARS-CoV-2. From February 3 to 20, 2020, we obtained 9878 posts by using the keyword 肺炎 患者求助 (COVID-19 pneumonia patients seeking help) from Sina Weibo through its application programming interface (API). Python (Python Software Foundation) was used to implement a rule-based screening and classification method on the PyCharm platform. We used the collected posts as a training set, including related posts and unrelated posts. Based on the post-for-help rules formulated by Sina Weibo, we considered the post text, as well as keywords pertaining to name, age, home address, time of illness, and description of illness as a related post; otherwise, it was deemed an irrelevant post. We excluded 6922 irrelevant posts that only described opinions and feelings about help seeking related to COVID-19 and initially collected 2956 related posts that contained mentions of clinical symptoms and/or imaging descriptions. Then, we manually screened out and excluded posts. We excluded 1679 reposted posts, 556 posts with a significant amount of missing valid clinical data, 195 nonpneumonia patient posts, and 41 patient posts with non-Wuhan home addresses. Finally, we selected 485 patient posts that presented clinical symptoms and imaging descriptions (Figures 1 and 2). The number of patient posts on Sina Weibo has been declining because these patients have actively deleted posts upon hospital admission.
We collected clinical symptoms, chest computed tomography (CT) findings (the chest CT was only summarized for those who provided a clinical report), days from illness onset to online help, days from illness onset to RT-PCR testing, RT-PCR test results, the relationship between helpers and patients, and home address details from Sina Weibo's records. We performed a study on the clinical characteristics of suspected or laboratory-confirmed patients with the SARS-CoV-2 infection seeking help on Sina Weibo. Suspected cases were identified as having fever or respiratory symptoms such as shortness of breath, cough, productive sputum, or chest pain. A laboratory-confirmed case with SARS-CoV-2 infection was defined as a positive result to high throughput sequencing or real-time reverse transcription-polymerase chain reaction (RT-PCR) assay of throat swabs and sputum [2].
We also used a descriptive research methodology to analyze the distribution of patients throughout Wuhan and the relationship between helpers and patients. The distance from patients' home to the nearest designated hospital was calculated using the geographic information system ArcGIS. The data used in the current study is publicly accessible on Sina Weibo and readers can obtain the raw data online [24]. We have effectively protected the privacy of subjects and strictly adhered to the principle of confidentiality in terms of information collection, storage and transmission, and information use and deletion. The study was approved by the Shanghai Jiaotong University Xinhua Hospital Ethics Committee and was carried out in accordance with the Declaration of Helsinki. We have made an application for exemption from informed consent and obtained approval.

Statistical Analysis
Continuous variables were expressed as median (IQR) when appropriate. Categorical variables were summarized as counts and percentages in each category. Analysis was conducted using SPSS, version 19.0 (IBM). We used ArcGIS, version 10.2.2, to plot the numbers of patients seeking help on a map.

Demographic and Clinical Characteristics
We selected 485 patients with suspected or laboratory-confirmed SARS-CoV-2 infection with at least clinical symptoms and imaging descriptions from Sina Weibo. The demographic and clinical characteristics were shown in Table 1

The Distribution of Patients Throughout Wuhan and the Distance Between Helpers and Hospitals
All patients were located in Wuhan, but more patients lived in the central districts (Hongshan, Jiang'an, Wuchang, Hanyang, and Qiaokou) compared to outskirt districts ( Figure 4). We further analyzed the distance between patients and the nearest designated hospital. Among these patients, four had missing home address information. We found that 25 Figure 5).

Principal Findings
This study has shown that patients seeking help on Sina Weibo lived in Wuhan and most of them were elderly. Our statistical analysis of the age of patients seeking help on Sina Weibo demonstrated that patients on Sina Weibo were older-the proportion of patients who were ≥65 years was as high as 46.17%. Zhong et al [3] reported that only a small proportion (15.1%) of 1099 laboratory-confirmed COVID-19 patients were aged ≥65 years. On the other hand, our study has found that the highest incidence was among adults over 50 years of age [25].
Additionally, 23.09% of patients had at least one underlying disorder. Fever was the dominant symptom whereas gastrointestinal symptoms were rare. Ground-glass opacity was the most common pattern on chest CT. Among all laboratory-confirmed COVID-19 patients, the most common pattern on chest CT were ground-glass opacity (56.4%) [3]. Our study has shown that the median time from illness onset to RT-PCR testing was 8 days, and the median time from illness onset to online help was 10 days. A recent study showed that the mean time from onset to hospital admission in 44 patients in Wuhan, with onset before January 1, was 12.5 days; in 189 patients with onset from January 1 to 11, the mean time was 9.1 days [5].
Person-to-person transmission of COVID-19 in hospital and family settings has been increasing [26][27][28][29]. Family clustering played an important part in increasing the number of COVID-19 cases [30]. Our study provided further evidence of human-to-human transmission, although 60.33% of families had no clustered onset, indicating that home isolation may be effective for patients. However, 39.67% of families had suspected and/or laboratory-confirmed cases among family members. In addition, 36.58% of families had 1 or 2 suspected and/or laboratory-confirmed family members. This is also in line with the finding that patients, on average, transmit the infection to 2.2 other people [5]. Therefore, home isolation might lead to the risk of COVID-19 outbreaks in family clusters [31]. This means that it is crucial to strictly isolate patients and trace and quarantine contacts as early as possible [32,33]. The most stringent centralized medical observation measures should be taken as soon as possible to avoid outbreaks in family clusters due to home isolation [31], such as a modular hospital to treat patients with mild illness [34].
Our research also found that the number of patients in the Wuchang, Jiang'an, Qiaokou, Hongshan, and Hanyang districts was greater than in other districts. Figure 4 shows a central agglomeration of patients; this may be consistent with the outbreak of the epidemic in the Huanan Seafood Wholesale Market in the Jianghan district, which was thought to be the initial infection site from an animal source in China [35] or it may be related to the developed economy, convenient transportation, and the population density in the city center. Therefore, close contact with family members and actual population movements from the outbreak source were risk factors for the spread of SARS-CoV-2.
In total, 32.22% (155/481) of patients lived more than 3 kilometers away from their nearest designated hospital. According to Baidu maps, adults can walk 4 kilometers in 1 hour. Considering that the patients in this study were older and their health condition may have slowed them down even more, we estimate that patients could walk 3 kilometers in a 1-hour period. Hence, this indicates that a patient would need to walk more than 1 hour to see a doctor since public transportation was suspended at the time. This may be one of the reasons why patients wanted to be admitted to a hospital. In addition, on February 5th, the Wuhan municipal health commission designated 28 hospitals for the treatment of laboratory-confirmed patients with the SARS-CoV-2 infection. The empty bed rate of hospitals within the city was only 3.6%. Thus, patients could not be hospitalized for the various reasons above. This also reflected an insufficiency of medical resources during the initial outbreak [36].
We also explored the relationship between helpers and patients. Judging from the content of Sina Weibo posts asking for help, "Mom" and "Dad" were high-frequency words; 70.52% (232/329) of helpers were the patients' relatives, indicating that the publishers of the help information were mostly the children of the elderly. Unfamiliarity with new technology may have hindered elderly people from seeking assistance from the outside world.
With the rapid and effective dissemination of help information, since February 5th, the People's Daily has launched an all-media operation to provide online help channels for patients with the SARS-CoV-2 infection. The government implemented a policy to maximize hospital admissions, which led to a rapid decrease in the number of people seeking help on Sina Weibo on February 6th and remained at low levels since February 8th, indicating that the needs of the public had been met. This also means that it is important to establish new and effective communication mechanisms for the dissemination of important factual information in a timely manner. Through this epidemic, we can see that medical resources are insufficiently allocated. There are substantial regional disparities in health care resource availability and accessibility in China [37]. The rapid increase in the number of patients during the initial outbreak led to a relative shortage of medical resources, which may threaten people with poor self-help capabilities such as the elderly. The government and health departments should pay attention to the elderly population during the outbreak. Social media can be used to understand public demand and aid the government in formulating accurate countermeasures following public demands for help. Although social media can establish effective communication channels, this technology may require a certain threshold, so the government should continue to increase the availability and accessibility of the network to better respond to public health emergencies.

Limitations
Our study has some limitations. First, given that our data was collected from a social media platform, the description of patients' symptoms and laboratory information were likely to be incomplete. Second, the urgent timeline for data extraction and the subjective judgment of the collectors might undermine the data quality to a certain extent. Finally, we learned that most of these patients have been admitted to the hospital with government help and many patients remain in the hospital, so we did not compare the 28-day rate for the composite endpoint.

Conclusions
In summary, our study found that the distance between patients and hospitals and the closure of public transportation further increased the difficulty of hospitalization for the elderly. We recommend the application of centralized medical observations to avoid the spread of COVID-19 through family clusters. In a public health emergency, making full use of available social media platforms can establish effective, factual communication channels and shorten admission times, helping patients get early attention during the Wuhan lockdown. These findings can help the government and health departments pay attention to the elderly population during the outbreak and accelerate emergency responses following public demands for help.