An Artificial Intelligence Model to Predict the Mortality of COVID-19 Patients at Hospital Admission Time Using Routine Blood Samples: Development and Validation of an Ensemble Model

Background COVID-19, which is accompanied by acute respiratory distress, multiple organ failure, and death, has spread worldwide much faster than previously thought. However, at present, it has limited treatments. Objective To overcome this issue, we developed an artificial intelligence (AI) model of COVID-19, named EDRnet (ensemble learning model based on deep neural network and random forest models), to predict in-hospital mortality using a routine blood sample at the time of hospital admission. Methods We selected 28 blood biomarkers and used the age and gender information of patients as model inputs. To improve the mortality prediction, we adopted an ensemble approach combining deep neural network and random forest models. We trained our model with a database of blood samples from 361 COVID-19 patients in Wuhan, China, and applied it to 106 COVID-19 patients in three Korean medical institutions. Results In the testing data sets, EDRnet provided high sensitivity (100%), specificity (91%), and accuracy (92%). To extend the number of patient data points, we developed a web application (BeatCOVID19) where anyone can access the model to predict mortality and can register his or her own blood laboratory results. Conclusions Our new AI model, EDRnet, accurately predicts the mortality rate for COVID-19. It is publicly available and aims to help health care providers fight COVID-19 and improve patients’ outcomes.

In a pandemic situation, the most important issue in the management of patients diagnosed with COVID-19 is to select patients at risk of high mortality in the early period of disease and to provide appropriate treatments [2]. Particularly, the condition of patients at high risk can rapidly deteriorate. Some papers reported that deceased COVID-19 patients initially had mild symptoms but suddenly transitioned to a critical stage, leading to death [3][4][5]. In Italy, 75% of deceased patients showed mild symptoms, such as fever, dyspnea, and cough, at admission to the hospital [1]. Thus, the development of a prognostic model to predict mortality as early as possible is very critical.
In this pandemic crisis, the shortage of resources and medical staff causes big problems in the health care system. Accordingly, artificial intelligence (AI) can aid in the management of COVID-19 patients. A recent research study has developed an AI prediction model of mortality based on blood test results [6]. In this study, Yan et al initially considered 73 blood-borne markers for the mortality prediction model; finally, three blood biomarkers were selected, including lactate dehydrogenase (LDH), lymphocyte, and high-sensitivity C-reactive protein (hs-CRP). This model predicted mortality with 90% accuracy based on a decision tree using an XGBoost classifier [7] to analyze feature importance.
However, Yan et al's study has drawbacks. First, the three biomarkers derived from the XGBoost-based feature selection may not be the best choices. Feature importance provides a score indicating how each feature contributes in the construction of decision trees within the model. However, due to the stochastic nature of machine learning algorithms, each feature's importance score may vary. Moreover, in decision tree algorithms, such as an XGBoost and a random forest (RF), when multiple features have the same gain during the split, a branch in a tree is made by randomly selecting features among them. Second, numerous studies have shown that the disease progression of COVID-19 is not only associated with LDH [2,[8][9][10][11], lymphocyte [12,13], and hs-CRP [2,10,[14][15][16][17] but also with other blood-based biomarkers, such as neutrophil counts [16,18,19], albumin [18,20,21], and prothrombin activity [18,[22][23][24]. In our study, we developed an AI model using 28 biomarkers for predicting the mortality of COVID-19 patients. Third, the three biomarker-based AI models [6] predicted mortality 10 days before a patient's recovery or death. These limitations show that the model may not work for COVID-19 patients who have just been diagnosed and hospitalized. Therefore, in this study, we aimed to develop an AI model based on a blood test for mortality prediction at the early stage of hospital admission. We deployed the developed AI model on a public website so that all patients and medical staff could predict mortality using individual patient blood test results.

Data Sets
This study was approved by Wonkwang University Hospital (WKUH), Chonnam National University Hospital (CNUH), and Samsung Medical Center (SMC) in Korea. Informed consent was waived. For training data, we used the blood test results obtained from 375 COVID-19 patients collected between January 10, 2020, and February 24, 2020, in Tongji Hospital, Wuhan, China [6]. Of these, 14 patients without a blood test within 1 day after the hospital admission were excluded, and 361 patients-212 males (58.7%) and 149 females (41.3%); mean age 58.9 years (SD 16.5)-were included. As presented in Multimedia Appendix 1, the training data set of 361 patients included the admission date and time, discharge date and time, age, gender, mortality outcome, and results of blood tests obtained within 24 hours after hospital admission. For testing data, we collected medical records on COVID-19 patients (N=106) from three medical institutions: CNUH (85/106, 80.2%), WKUH (11/106, 10.4%), and SMC (10/106, 9.4%). The blood laboratory results from these 106 COVID-19 patients were collected between February 2020 and July 2020. Similar to the training data, we used the blood test data obtained within 24 hours after hospital admission (see Multimedia Appendix 2). For summarizing the statistics of the training and testing data sets, the patients were classified into a survivor group and a deceased group in the training and testing data sets. The number of blood tests differed across patients and institutions. The mean numbers of blood tests per patient were 61.21 (range 24-73) in the training data set and 35.36 (range 30-55) in the testing data set. The mean numbers of hospitalization days were 13.82 (survivor group) and 8.16 (deceased group) in the training data set and 18.21 (survivor group) and 17.98 (deceased group) in the testing data set (see Table 1).

Feature Selection
Given the total 73 blood biomarkers from the training data, we performed an analysis of variance (ANOVA), which uses an F test to check for any significant difference between the two groups (ie, deceased vs survivor) according to each blood biomarker. For the feature selection, we also considered the available data rate (ADR), which refers to how much blood biomarker data were available for training the AI model. This is calculated as where N patients is the total number of patients (N=361) and N biomarker is the number of patients having each of the specific biomarker data.
Based on the ANOVA, we first selected the top 32 biomarkers corresponding to P values less than 10 -5 . Subsequently, we excluded four biomarkers with ADR values of less than 90%. Table 2 summarizes the final selection of 28 biomarkers with the corresponding ANOVA P values and ADR values. The ANOVA P values and ADR values for all 73 biomarkers in the training data set are summarized in Multimedia Appendix 3, Table S1. The sample distributions of the selected 28 biomarkers in the survivor and deceased groups are presented in Multimedia Appendix 3, Figure S1.

Preprocessing
Given the selected 28 biomarkers, the mean number of available biomarkers per patient was 27.22 (SD 2.33) for the training data and 16.86 (SD 1.58) for the testing data, as summarized in Table   3. To handle the missing data, we calculated the mean value from the training data for each biomarker and replaced the missing data with the mean value for the training and testing data sets. We then added two more features (ie, age and gender) to the 28 biomarkers and trained our AI model using 30 features. With the 30 features, we performed data set standardization, which is a common requirement for machine learning estimators. The standardization changes the data distribution of each feature with zero mean and standard deviation of 1 as where mean(train) and SD(train) are the mean and standard deviation values, respectively, for each feature from the training data. The standardization was applied to the training and testing data sets.

Development of an Ensemble AI Model
As illustrated in Figure 1, the new ensemble AI model is composed of a 5-layer deep neural network (DNN) and RF model. Our ensemble AI model was named as EDRnet (ensemble learning model based on DNN and RF models). The 5-layer DNN was comprised of an input layer, three fully connected (FC) layers, and an output layer. The input layer contained 30 features, including 28 biomarkers, age, and gender. The input layer was fed into three FC layers in a series, each of which consisted of 30, 16, and 8 nodes. To alleviate the overfitting issue, we applied a dropout rate of 0.3. Then, the last FC layer was fed into a softmax layer, which is an output layer providing the probabilities for the patient mortality. Figure  S2 in Multimedia Appendix 3 shows our DNN model and its printed textual summary run on Keras, where the total number of parameters (ie, weights and biases) was 1571. In the training of both models, a 10-time-repetition 10-fold stratified cross-validation was separately performed, and the predicted mortality probabilities of the DNN model, p(DNN), and the RF model, p(RF), were calculated. The final predicted mortality probability of the ensemble model, p(EDR), was obtained by soft voting based on the p(DNN) and the p(RF). ADR: available data rate; ANOVA: analysis of variance; EDRnet: ensemble learning model based on DNN and RF models.
For the 5-layer DNN, a 10-time-repetition 10-fold stratified cross-validation was performed to confirm the model's generalization ability. The training data (N=361) were randomly shuffled and partitioned into 10 equal subgroups in a stratified manner. Of the 10 subgroups, a single subgroup was retained as the validation data set for testing the model, and the remaining nine subgroups were used as the training data set. The process was then repeated 10 times, with each of the 10 subgroups used exactly once as the validation data set. By repeating this stratified 10-fold cross-validation process 10 times, a total of 100 models from the 5-layer DNN were derived. Then, we ensembled the models with the weighted average as where y i is the label (ie, 1 for deceased and 0 for survived) and p(y i ) is the predicted probability of each patient being deceased for the batch size N number of patients.

Performance Evaluation of AI Models
To evaluate the performance of the AI models in predicting mortality, we used the sensitivity, specificity, accuracy, and balanced accuracy metrics, defined as where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative, respectively.
In the training data set, the prediction performance of the AI models was evaluated based on a 10-time-repetition 10-fold stratified cross-validation. In the testing data set, the prediction performance of the AI models was independently evaluated.
To compare the performance of our proposed EDRnet model with those of other external AI models, we separately trained the models of XGBoost and AdaBoost (AB), each of which was evaluated as a single model and as an ensemble model combined with DNN, resulting in four models: XGBoost, AB, ensemble with DNN and XGBoost (EDX), and ensemble with DNN and AB (EDA). For the training of these models, we searched the optimal hyperparameters providing the highest validation accuracy value, as presented in Multimedia Appendix 3, Table  S2. In addition, we adopted a recently published AI model by Li et al [6] using a decision tree via an XGBoost-based feature selection for performance comparison. All five external AI models were evaluated using our testing data set of 106 patients.

Results
The cross-validation of RF, DNN, and our ensemble model EDRnet showed that the accuracy on the validation data set is 89% for RF, 92% for DNN, and 93% for EDRnet. Thus, EDRnet provides the highest sensitivity, specificity, accuracy, and balanced accuracy values (see Table 4).  Table 5). The computational times of DNN and RF in EDRnet for the training were 796 and 126 seconds, respectively. The overall computational time for the testing of EDRnet was 72 seconds.  [6], and 96% with EDRnet. Notably, the accuracy of Li et al's model was only 36%, indicating that a few blood markers may not be sufficient to predict patient mortality (see Table 6). Our proposed EDRnet model used 28 blood biomarkers for prediction, but it does not require all 28 blood biomarkers. In our testing data sets, EDRnet training was validated using available biomarkers, ranging from 14 to 24, for each patient (see Figure 2). The results reveal that the majority of the patients had 19 to 21 available biomarkers (ie, 19 in 15 patients, 20 in 41 patients, and 21 in 22 patients) with a similarly high prediction accuracy (ie, 93%, 95%, and 86%, respectively). For the patients with 17 and 18 available biomarkers, the accuracy was 75% and 50%, respectively. By contrast, the patients with 14 to 16 biomarkers showed a high accuracy ranging from 83% to 100%. To further investigate the effect of the number of available biomarkers, we estimated the accuracy values according to the number of available biomarkers (see Figure 3). For the estimation, we randomly selected 1 to 20 biomarkers from all of the testing data points and tested the model with a 100-time repetition. When randomly selecting biomarkers, only samples where the actual available number of biomarkers was equal to or greater than the number of randomly selected biomarkers were simulated. The results show that accuracy increases with the number of available biomarkers until reaching 19 biomarkers. Furthermore, our developed AI model, EDRnet, was successfully deployed on a public website [25] so that anyone can predict mortality using individual blood test results. The web application provides predicted mortality probability, as shown in Figure 4. A user inputs his or her blood sample results (see Figure 4a), and then the predicted mortality results are presented (see Figure 4b). Currently, the web application does not store any information entered by users. However, we consider and plan to store information entered by users on agreement to improve the AI model via a real-time learning process. Regarding clinical characteristics (see Table 7), there were no significant differences in comorbidity. In terms of initial symptoms, the deceased group had more frequent dyspnea symptoms than the survivor group (66.7% vs 16.8%; P=.04). All patients from the deceased group required oxygen supply. The deceased group had more frequent altered mentality than the survivor group (50.0% vs 1.0%; P=.02). There was no significant difference in terms of antiviral drugs (ie, lopinavir or ritonavir, chloroquine or hydroxychloroquine, ribavirin, remdesivir, and oseltamivir) or anti-inflammatory drugs (ie, interferon, dexamethasone, and methylprednisolone) between the deceased and survivor groups. However, the deceased group received more antibiotics or combination therapy. Table 7. Clinical characteristics of the patient groups from the testing data set.

Principal Findings
Our new AI model, EDRnet, was able to predict the mortality of COVID-19 patients using 28 blood biomarkers obtained within 24 hours after hospital admission. In the independent testing data sets, EDRnet showed excellent prediction performance with high sensitivity (100%), specificity (91%), and accuracy (92%). We were able to improve the prediction performance by adopting the ensemble approach combining DNN and RF models. Of note, EDRnet was developed by training with Chinese patients' data and testing with Korean patients' data.
EDRnet has several advantages. First, EDRnet can predict which patients are at a high risk of mortality in the early stage of hospital admission (ie, within 24 hours after admission). This is a substantial improvement compared to the prior AI prediction model reported by Yan et al, which predicted mortality 10 days before the occurrence of survival or death [6]. The mortality prediction at the time of admission can be substantially informative for clinicians because the critical time regarding disease progression is 10 to 14 days from the onset of symptoms, according to previous studies [13,16,26]. EDRnet can provide treatment priority guidance regarding who should be treated intensively. Second, EDRnet only uses blood biomarkers to predict mortality. In general, COVID-19 patients get blood laboratory tests at the time of hospital admission [9,27]. Blood biomarkers are objective indices that are used to estimate patients' conditions in a quantitative manner, which may be beneficial to assure the reliability of the AI model. We did not include subjective biomarkers, such as symptoms, nor predisposing factors, such as underlying comorbidities, because these indices are difficult for quantification and may show high variability between patients. Third, the clinical meaning and significance of blood biomarkers used in our EDRnet model have been well investigated through many prior clinical studies. Thus, the AI's predicted mortality results are explainable and easily understood by doctors. Furthermore, several major blood biomarkers are used in our EDRnet model.
The hematological changes in lymphocytes, neutrophils, monocytes, eosinophils, and platelets are common, as these changes are related to viral replication and hyperinflammation in COVID-19 infection [12,13]. In severe cases, the infiltration and sequestration of CD4+/CD8+ T cells occurred, leading to a decrease in the peripheral lymphocytes. Neutrophil counts [19][20][21] were significantly higher in the severe group than in the mild group. Platelet count, platelet volume, and platelet large-cell ratio are related to COVID-19 infection because immunologic destruction can lead to inappropriate platelet activation and consumption as well as impaired megakaryopoiesis [28][29][30].
Hypoalbuminemia [18,20,21], hypocalcemia [31][32][33], and elevated aspartate aminotransferase [18] are highly associated with severe COVID-19 infection requiring hospitalization in the intensive care unit. Urea and estimated glomerular filtration rate are important lab findings associated with an underlying chronic renal disease, which is a well-known predisposing factor of mortality [34]. In terms of the coagulation profile, COVID-19 generally presents a hypercoagulation state, thus resulting in an elevated prothrombin time and international normalized ratio in severe COVID-19 cases [3,18].
In this study, no significant differences were observed in the use of pharmacologic agents between the deceased and survivor groups except antibiotics and in the use of antiviral drugs, such as remdesivir. Antibiotics or combination therapy is usually used for suspected bacterial superinfection that represents severe diseases. To date, there has been no successfully effective pharmacologic agent to treat COVID-19. The pharmacologic treatment is not significantly related to survival in this study.
EDRnet does not require all 28 blood biomarkers for the prediction of mortality. EDRnet worked well as long as there were at least 19 blood biomarkers at the time of admission. Compared to prior AI prediction models for COVID-19 mortality, which used three biomarkers, there might be concern that EDRnet requires too many biomarkers. However, these blood tests are commonly performed in our daily clinical practice for hospitalized patients with COVID-19. If more data are accumulated, then we can reduce the number of blood biomarkers for mortality prediction.

Limitations and Future Work
Our study has several limitations. First, the number of patients available for testing might be small. According to Johns Hopkins Coronavirus Resource Center, the mortality rate in South Korea is 1.7%. In the testing data set of 106 Korean patients, the mortality rate was 1.9%, which is almost equivalent to the actual mortality rate. It might be necessary to update EDRnet by training with a large population data set from all over the world. To update EDRnet, we made a web application [25] so that anyone can access the model. We believe that opening the AI model to the public is helpful to improve its performance and generalizability. Second, our data did not include other races, such as Caucasian or Middle East Asian. Our future research plan is to establish a real-time AI training system that can continue to train our model using prospectively collected data from all over the world. In addition, we will upgrade the web application so that the database framework allows a user to input his or her blood sample results along with the outcome. Based on the extended data, we will improve EDRnet for better generalization.

Conclusions
In conclusion, our new AI model, EDRnet, was developed to predict the mortality of COVID-19 patients at the time of hospital admission using blood biomarkers only. It is now open to the public with the hope that it can help health care providers fight COVID-19 and improve patients' outcomes.