This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
The number of deaths worldwide caused by coronavirus disease (COVID-19) is increasing rapidly. Information about the clinical characteristics of patients with COVID-19 who were not admitted to hospital is limited. Some risk factors of mortality associated with COVID-19 are controversial (eg, smoking). Moreover, the impact of city closure on mortality and admission rates is unknown.
The aim of this study was to explore the risk factors of mortality associated with COVID-19 infection among a sample of patients in Wuhan whose conditions were reported on social media.
We enrolled 599 patients with COVID-19 from 67 hospitals in Wuhan in the study; 117 of the participants (19.5%) were not admitted to hospital. The demographic, epidemiological, clinical, and radiological features of the patients were extracted from their social media posts and coded. Telephone follow-up was conducted 1 month later (between March 15 and 23, 2020) to check the clinical outcomes of the patients and acquire other relevant information.
The median age of patients with COVID-19 who died (72 years, IQR 66.5-82.0) was significantly higher than that of patients who recovered (61 years, IQR 53-69,
Older age, diffuse distribution, and hypoxemia are factors that can help clinicians identify patients with COVID-19 who have poor prognosis. Our study suggests that aggregated data from social media can also be comprehensive, immediate, and informative in disease prognosis.
In December 2019, a novel coronavirus disease (COVID-19) emerged in China and began to spread globally. As of April 08, 2020, the outbreak has resulted in 82,992 deaths worldwide [
The study protocol was approved by the research ethics committee of Renmin University of China on February 5, 2020. Data were obtained from two sources: Weibo posts and a telephone survey. The Weibo data were posted on the internet by families impacted by COVID-19 between January 20 and February 15, 2020, and were collected between February 3 and February 15, 2020. Then, volunteers phoned each participant’s family to describe the study and obtain their oral consent to participate. Over 60% of the patients (599/911, 65.8%) agreed to participate and completed most of the questions. One month later, a follow-up telephone call was conducted to collect the outcomes of each patient (between March 15 and 23, 2020).
In this study, we used patient reports from Weibo to conduct the analysis. Weibo is a Chinese microblogging website that resembles a hybrid of Twitter and Facebook; it uses a format similar to that of Twitter. This microblog provides a platform for patients infected with COVID-19 in Wuhan to seek help on the internet. Many patients reported their onset of symptoms, listed their symptoms, and uploaded their medical records and computed tomography (CT) images to seek medical care on the internet. We carefully monitored reports of patients infected with COVID-19 on Weibo between January 20 and February 15, 2020, and downloaded the data. We extracted and coded this information and deidentified the patient information by removing their names, home addresses, and contact information. We obtained 911 original COVID-19 patient reports. We only included patients who were diagnosed by positive COVID-19 tests according to the guidance provided by the Chinese National Health Commission. The positive tests consisted of either CT or real-time reverse transcriptase–polymerase chain reaction (RT-PCR) reports on the internet and were further confirmed by the follow-up telephone call.
The Weibo messages were posted from February 3 to 15, 2020. The median time from onset of symptoms to the posting of messages was 7 days (IQR 3-11). We conducted telephone follow-up calls 1 month later (between March 15 and 23) to check the clinical outcomes of the patients and acquire information about their smoking behavior, admission time, time to discharge or death, duration of time in the hospital, name of the hospital, and medication used for hypertension. Only patients who had a definite outcome (died or recovered) were included in our study.
All Weibo messages were collected on the internet from February 3 to February 15, 2020. The research team extracted vital information from individual patients’ reports on the internet. A series of individual-level patient data, including demographic information, underlying comorbidities, symptoms, and signs, were coded. We double-checked and reviewed the data. The data were entered into a computerized database and cross-checked.
The symptoms coded included hypoxemia, inability to eat, cough, acute respiratory distress (ARD), dizziness, headache, confusion, unconsciousness, hemoptysis, chest pain, muscle pain, fatigue, vomiting, chest distress, loss of appetite, diarrhea, shortness of breath, and fever. The underlying comorbidities coded included chronic respiratory disease, chronic liver disease, hypertension, diabetes, chronic vascular disease, chronic lung disease, chronic heart disease, cancer, and kidney disease. Other information coded from Weibo messages included symptom onset date, sex, and age. In the telephone survey, we specifically asked which hypertension medications the patients were taking. We coded the type of medication as angiotensin-converting enzyme inhibitors (ACEIs), angiotensin II receptor blockers (ARBs), and others.
All CT images in the original posts were extracted and recorded. The CT images and CT reports were evaluated by an experienced radiologist (WJ) who was blinded to the patient survival results when she interpreted the images. The radiologist examined and coded the following features: lesion distribution, lesion characteristics, and pleural effusion.
Our data have been made public so that readers can replicate our analysis. The data can be found in the supplemental materials of this paper (
Descriptive statistics were obtained for all study variables. Categorical variables were described as frequency rates and percentages and were compared for the outcomes of the study using the Fisher exact test. The continuous variables were described using the median, range, and SD values and were compared using
Cumulative rates of death were determined using the Kaplan-Meier method. The associations between age groups, hospital admission status, lesion distribution, and pleural effusion and death outcomes were examined using the Cox proportional hazard regression model. We used the Kaplan-Meier method to plot the survival curves and used multivariate Cox regression to determine the independent risk factors for mortality.
All statistical tests were 2-tailed.
We enrolled 599 patients in our study; of these patients, 516 (86.1%) recovered, and 83 (14.9%) died.
The median time from symptom onset to discharge was much longer than the time to death, namely 36 days (IQR 29.0–44.0) and 14 days (IQR 9.0-20.0,
The median age of the deceased patients (72 years, IQR 66.5-82.0) was significantly higher than that of the recovered patients (61 years, IQR 53-69,
Baseline characteristics of patients with coronavirus disease (N=599).
Characteristic | Patient outcomes | ||||||||
|
|
All (N=599) | Recovered (n=511) | Died (n=83) |
|
||||
Age (years), median (range) | 63 (2-93) | 61 (2-89) | 72 (33-93) | <.001 | |||||
Female sex, n (%) | 274 (46.1) | 243 (47.6) | 31 (37.3) | .10 | |||||
Smoker, n (%) | 123 (22.0) | 106 (21.6) | 17 (24.6) | .54 | |||||
|
|||||||||
|
Fever | 471 (79.8) | 408 (80.3) | 63 (76.8) | .46 | ||||
|
Cough | 241 (40.8) | 216 (42.5) | 25 (30.5) | .04 | ||||
|
Hemoptysis | 11 (1.9) | 10 (2.0) | 1 (1.2) | >.99 | ||||
|
Dyspnea | 291 (49.3) | 244 (48.0) | 47 (57.3) | .12 | ||||
|
Shortness of breath | 74 (12.5) | 68 (13.4) | 6 (7.3) | .15 | ||||
|
Fatigue | 199 (33.7) | 175 (34.4) | 24 (29.3) | .38 | ||||
|
Muscle ache | 19 (3.2) | 19 (3.7) | 0 (0) | .09 | ||||
|
Diarrhea | 80 (13.6) | 73 (14.4) | 7 (8.5) | .17 | ||||
|
Chest pain | 13 (2.2) | 12 (2.4) | 1 (1.2) | >.99 | ||||
|
Vomiting | 72 (12.2) | 67 (13.2) | 5 (6.1) | .07 | ||||
|
Chest distress | 64 (10.8) | 61 (12.0) | 3 (3.7) | .02 | ||||
|
Loss of appetite | 156 (26.4) | 130 (25.6) | 26 (31.7) | .28 | ||||
|
Inability to eat | 24 (4.0) | 18 (3.5) | 6 (7.2) | .13 | ||||
|
Hypoxemia | 33 (5.6) | 23 (4.5) | 10 (12.2) | .02 | ||||
|
Confusion | 17 (2.9) | 12 (2.4) | 5 (6.1) | .07 | ||||
|
Unconsciousness | 15 (2.5) | 12 (2.4) | 3 (3.7) | .45 | ||||
|
Dizziness | 18 (3.1) | 16 (2.4) | 2 (3.1) | >.99 | ||||
|
Headache | 22 (3.7) | 20 (3.9) | 2 (2.4) | .76 | ||||
|
|||||||||
|
Hypertension | 87 (14.7) | 79 (15.5) | 8 (9.8) | .24 | ||||
|
Diabetes | 57 (9.6) | 49 (9.6) | 8 (9.8) | >.99 | ||||
|
Chronic heart disease | 70 (11.8) | 61 (12.0) | 9 (11.0) | .49 | ||||
|
Chronic lung disease | 25 (4.1) | 22 (4.3) | 3 (3.7) | >.99 | ||||
|
Cerebrovascular disease | 15 (2.5) | 13 (2.5) | 2 (2.4) | .95 | ||||
|
Chronic kidney disease | 28 (4.7) | 26 (5.1) | 2 (2.4) | .41 | ||||
|
Chronic liver disease | 14 (2.7) | 0 (0) | 14 (2.4) | .13 | ||||
|
Chronic respiratory disease | 12 (2.0) | 11 (2.2) | 1 (1.2) | >.99 | ||||
|
Cancer | 17 (2.9) | 16 (3.1) | 1 (1.2) | .49 |
Age distributions of patients with coronavirus disease in our study who recovered and who died.
At baseline, the most common symptoms were fever (471/599, 79.8%) and cough (241/599, 40.8%); see
Univariate Cox regression analysis showed that age older than 70 years (
The multivariable-adjusted Cox proportional hazard regression model showed a significantly lower risk of death in patients with hypertension when controlling for age and sex (
Multivariate Cox regression analysis of the risk factors associated with mortality in patients with coronavirus disease (n=83).
Factor | Death (%) | Univariate model | Multivariate model | |||
|
|
|
Crude hazard ratio (95% CI) | Adjusted hazard ratio (95% CI) | ||
|
||||||
|
40-59 | 9 (4.6) | 1.28 (0.15-10.93) | .82 | 1.82 (0.21-15.78) | .59 |
|
60-69 | 22 (12.5) | 5.19 (0.69-39.04) | .11 | 8.27 (1.07-63.63) | .04 |
|
70-79 | 25 (23.4) | 11.7 (1.58-86.87) | .02 | 14.01 (1.85-105.83) | .01 |
|
≥80 | 21 (46.7) | 22.92 (3.01-174.35) | .002 | 36.14 (4.68-279.27) | .001 |
Female sex | 31 (11.3) | 0.74 (0.44-1.25) | .26 | 0.71 (0.40-1.26) | .24 | |
Hospital admission | 33 (7.0) | 0.20 (0.12-0.34) | <.001 | 0.16 (0.093-0.28) | <.001 | |
Hypertension | 8 (9.2) | 0.80 (0.38-1.69) | .57 | 0.24 (0.08-0.67) | .006 | |
Hypoxemia | 10 (30.3) | 2.45 (1.11-5.38) | .03 | 3.39 (1.51-7.62) | .003 |
Cumulative incidence of death of patients with coronavirus disease grouped by (a) hospital admission, (b) time length between symptom onset and hospital admission, (c) hypoxemia, and (d) hypertension.
Cumulative incidence of death of patients with coronavirus disease grouped by (a) age group and (b) hospital admission and severity of illness. AC: admission, critically ill; AN: admission, not critically ill; NC: no admission, critically ill; NN: no admission, not critically ill.
R software output showing the association of computed tomography characteristics with outcomes of patients with coronavirus disease. The hazard ratios of each variable were obtained using proportional hazard Cox models after adjustment for age and sex. **
R software output showing the association of computed tomography characteristics with outcomes of patients with coronavirus disease adjusted for time between onset of symptoms and hospital admission. The hazard ratios of each variable were obtained using proportional hazard Cox models after adjustment for age and sex. Onset to admission refers to the number of days between symptom onset and hospital admission. OnsetToAdmit: onset to admission. *
The proportions of the 227 patients with unilateral, bilateral, or diffuse pneumonia were 25 (11%), 198 (87.2%), and 4 (1.8%), respectively. Of the 224 patients with positive or negative pleural effusion, 5 (2.2%) had positive effusion; see
The Fisher exact tests suggested that lesion distribution differences were significant between patients who died and recovered (
Univariate Cox regression and Kaplan-Meier analysis also showed that patients with diffuse pneumonia had a significantly higher risk of death (
Radiographic characteristics of patients with coronavirus disease (n=227), n (%).
Computed tomography characteristic | Patient outcomes | ||||
|
|
All | Recovered | Died |
|
|
|
|
|
.045 | |
|
Total | 227 (100.0) | 190 (83.7) | 37 (16.3) |
|
|
Unilateral | 25 (11.0) | 24 (12.6) | 1 (2.7) |
|
|
Bilateral | 198 (87.2) | 164 (86.3) | 34 (91.9) |
|
|
Diffuse | 4 (1.8) | 2 (1.1) | 2 (5.4) |
|
|
|
|
|
.78 | |
|
Total | 222 (97.8) | 186 (83.8) | 36 (15.9) |
|
|
Ground-glass opacity | 120 (54.1) | 100 (53.8) | 20 (55.6) |
|
|
Patchy shadowing | 27 (12.2) | 22 (11.8) | 5 (13.9) |
|
|
Mixed | 70 (31.5) | 59 (31.7) | 11 (30.6) |
|
|
Predominant consolidation | 5 (2.3) | 5 (2.7) | 0 (0) |
|
|
|
|
|
.18 | |
|
Total | 224 (98.7) | 189 (83.3) | 35 (15.4) |
|
|
Negative | 219 (97.8) | 186 (98.4) | 33 (94.3) |
|
|
Positive | 5 (2.2) | 3 (1.6) | 2 (5.7) |
|
Histogram of lesion distribution among patients with coronavirus disease. NA: not applicable.
Cumulative incidence of death of patients with grouped by (a) lesion distribution and (b) pleural effusion.
R software output showing the association of computed tomography characteristics with outcomes of patients with coronavirus disease. The hazard ratios of each variable were obtained using proportional hazard Cox models after adjustment for age and sex. ***
Lesion distribution on computed tomography (CT) images in patients with coronavirus disease pneumonia. (a) 60-year-old woman, unilateral lesion distribution; the axial CT image shows patchy shadowing in the right upper lobe. This patient recovered after 20 days of treatment in hospital. (b) 53-year-old man, bilateral lesion distribution; the axial CT image shows mixed lesions with ground-glass opacity and patchy shadowing in the bilateral lower lobes. This patient recovered after 30 days of treatment in hospital. (c) 68-year-old man, diffuse lesion distribution; the axial CT image shows ground-glass opacity in bilateral lungs. This patient died without treatment in hospital.
In this case series of COVID-19 patients reported on the internet in Wuhan, China, one-third were older people; people aged ≥70 years represent 174/581 (29.9%) of cases. The overall mortality rate was 83/599 (13.9%). Age is one of the most frequently reported prognostic factors in COVID-19; this has been reported consistently in many recent studies worldwide [
The median time from symptom onset to discharge and the length of hospital stay was much longer than those in the deceased patients, which may be due to the discharge standard of COVID-19 patients in China (afebrile for >3 days; improved respiratory symptoms; pulmonary imaging shows apparent absorption of inflammation; two consecutive negative nucleic acid tests for respiratory tract pathogen with a sampling interval ≥24 hours) [
The typical clinical characteristics presented here were consistent with recent studies, except that the incidence of cough was lower (40.8% vs 72.2%) [
As in other reports [
The predominant patterns of abnormality observed were bilateral opacity (198/227, 87.2%) and ground-glass opacity (120/222, 54.1%), which is consistent with another report [
Hospital admission and the time from illness onset to hospitalization were significant prognostic factors. Both univariate Cox regression and Kaplan-Meier analysis indicated that hospital admission and disease severity (critically ill or not) were associated with death risk (
We also plotted the mortality and admission rates of COVID-19 on the map across three periods. According to Pan et al [
Geographic distributions of mortality and hospital admission rates of coronavirus cases across three time periods in Wuhan, China. The mortality and admission rate of the cases were calculated using the number of deaths or admissions divided by the total number of cases reported in the area and time period. The data include all 13 districts of the city of Wuhan; regions with fewer than 5 cases were considered to be nonrepresentative and are plotted in grey.
Our study suggests that use of social media data can be effective to identify patients at high risk for COVID-19, help coordinate appropriate treatment, and lower the mortality rate. The use of social media can also reduce cross-infection risks by reducing the number repeat visits of low-risk patients to the hospital. In the future, social media can be adopted to effectively help potentially critically ill patients seek timely medical treatment, help patients with low mortality risks to reduce unnecessary cross-infection, screen out critically ill patients in urgent need of hospitalization, and finally to facilitate disease control and hierarchical management. One limitation of this method is that it still requires particular definition and attention from the aspects of law, policy, and ethics; another limitation is that it requires active management and supervision procedures with participation of medical professionals to ensure its accuracy, effectiveness, and reasonableness.
This study has several limitations. First, the data were acquired on the internet and followed up via telephone. The nature of the data did not allow us to obtain more detailed information. Second, we did not obtain details regarding the patients’ laboratory characteristics, clinical course, or treatment. Also, some radiological files were not complete.
Hospital admission at an appropriate time is vital for patients with COVID-19, especially those who are critically ill. Older age, hypoxemia, and pleural effusion were related to poor prognosis of mortality. Public health measures such as transportation blocking and city closure should be combined with other measures, such as increasing admission rates and shortening wait times for treatment.
Currently, more than 2.9 billion individuals use social media regularly. Considering the substantially high speed, reach, penetration, and transparency of social media platforms, social media can be used not only to disseminate but also to collect critical information about a sudden outbreak of disease. Individual patients’ reports of their symptoms, clinical characteristics, treatment, and clinical outcomes on social media can be aggregated into big data and analyzed in real time to provide valuable insights to accelerate research speed [
Supplemental materials.
angiotensin-converting enzyme inhibitor
angiotensin receptor blocker
acute respiratory distress
coronavirus disease
computed tomography
intensive care unit
effective reproduction number
reverse transcription–polymerase chain reaction
The study is supported by Young Backbone Teachers’ Overseas Study program under the State Scholarship Fund (File No. 201806015049) from WYY and Renmin University of China and by Double-First Class Innovation Research Funding from LD (File No. KYGJD2020002).
None declared.