This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Several models have been developed to predict mortality in patients with COVID-19 pneumonia, but only a few have demonstrated enough discriminatory capacity. Machine learning algorithms represent a novel approach for the data-driven prediction of clinical outcomes with advantages over statistical modeling.
We aimed to develop a machine learning–based score—the Piacenza score—for 30-day mortality prediction in patients with COVID-19 pneumonia.
The study comprised 852 patients with COVID-19 pneumonia, admitted to the Guglielmo da Saliceto Hospital in Italy from February to November 2020. Patients’ medical history, demographics, and clinical data were collected using an electronic health record. The overall patient data set was randomly split into derivation and test cohorts. The score was obtained through the naïve Bayes classifier and externally validated on 86 patients admitted to Centro Cardiologico Monzino (Italy) in February 2020. Using a forward-search algorithm, 6 features were identified: age, mean corpuscular hemoglobin concentration, PaO2/FiO2 ratio, temperature, previous stroke, and gender. The Brier index was used to evaluate the ability of the machine learning model to stratify and predict the observed outcomes. A user-friendly website was designed and developed to enable fast and easy use of the tool by physicians. Regarding the customization properties of the Piacenza score, we added a tailored version of the algorithm to the website, which enables an optimized computation of the mortality risk score for a patient when some of the variables used by the Piacenza score are not available. In this case, the naïve Bayes classifier is retrained over the same derivation cohort but using a different set of patient characteristics. We also compared the Piacenza score with the 4C score and with a naïve Bayes algorithm with 14 features chosen a priori.
The Piacenza score exhibited an area under the receiver operating characteristic curve (AUC) of 0.78 (95% CI 0.74-0.84, Brier score=0.19) in the internal validation cohort and 0.79 (95% CI 0.68-0.89, Brier score=0.16) in the external validation cohort, showing a comparable accuracy with respect to the 4C score and to the naïve Bayes model with a priori chosen features; this achieved an AUC of 0.78 (95% CI 0.73-0.83, Brier score=0.26) and 0.80 (95% CI 0.75-0.86, Brier score=0.17), respectively.
Our findings demonstrated that a customizable machine learning–based score with a purely data-driven selection of features is feasible and effective for the prediction of mortality among patients with COVID-19 pneumonia.
Despite measureless efforts to limit the spread of COVID-19, over 100 million people have been confirmed positive for SARS-CoV-2 infection and more than 2 million people have died from the virus worldwide, as of February 10, 2021 [
Data from epidemiological studies suggest that severe illness occurs in approximately 20% of patients and that older age, coexisting medical conditions, and cardiovascular risk factors are associated with worse prognosis [
To date, several prognostic models combining clinical and laboratory parameters have been proposed, but they included mainly patients from the first wave of the COVID-19 pandemic. This may cause a risk of bias, making these models unsuitable for clinical decision in daily practice [
The increasing use of electronic health record (EHR) systems has increased the availability of a large amount of data suitable for machine learning analysis. The latter has already proven its potential to support clinical decisions in many medical fields, including the COVID-19 pandemic [
We hypothesized that a machine learning score based on data-driven selection of features, which is different from inference statistics, could capture nonlinear relationships among clinical features without human-biased intervention and predict mortality for individual patients more accurately than the currently available risk scores.
The study was conducted at Guglielmo da Saliceto Hospital, which serves a population of about 300,000 people in the area of Piacenza, Emilia Romagna, in northern Italy. This region has the second highest number of COVID-19 deaths in the country (6219 as of December 7, 2020).
This study retrospectively analyzed the EHRs of a cohort of 852 patients diagnosed with COVID-19 pneumonia according to the World Health Organization interim guidance and admitted to the hospital from February to November 2020. COVID-19 infection was diagnosed by a positive result on a reverse transcriptase–polymerase chain reaction (RT-PCR) assay of a specimen collected on a nasopharyngeal swab. Pregnant women, children (<18 years), and patients with a negative RT-PCR assay were excluded from the study as well as patients presenting with shock and coma.
Data collected in the EHR included patients’ demographic information, comorbidities, triage vitals, and laboratory tests and outcomes (including length of stay, discharge, readmission, and mortality). Routine blood examinations at admission comprised complete blood count, coagulation profile, and serum biochemical tests (including renal and liver function, creatine kinase, lactate dehydrogenase, electrolytes, and C-reactive protein). A total of 62 patient characteristics were considered in the score design and development. The study protocol was approved by the local committee on human research.
The criteria for discharge were at the discretion of the caregiver physician. In most cases, the criteria encompassed absence of fever for at least 3 days, as well as substantial clinical improvement including clinical remission of symptoms and 2 throat-swab samples negative for SARS-CoV-2 RNA obtained at least 24 hours apart. The primary outcome was 30-day in-hospital mortality.
The Piacenza score is a machine learning–based COVID-19 mortality risk predictor. It was implemented using a naïve Bayes approach, which is a probabilistic classifier describing the dependence from the outcome of each variable characterizing the patient, taken separately from the others. The naïve Bayes algorithm was chosen due to the following advantages: (1) it provides a probability of the final outcome, which thus represents the mortality risk; (2) it can handle both categorical and continuous features; and (3) it can handle missing values, thus providing a mortality risk even when all variable inputs for a patient are not available. Moreover, it proved a successful approach in predicting clinical outcomes in several medical scenarios [
Its major limitation stands in the assumption at the core of the method: features independence. Even if this assumption is almost never satisfied, the classifier proved to reach reasonable results in many scenarios, especially in text classification. Another drawback of naïve Bayes is that if a categorical feature presents a value in the test data set, which was not observed in the training data set, then the model will be unable to make a prediction. Nevertheless, this issue can be solved with various smoothing techniques. Patients missing some features can be easily handled. In fact, only the features’ probability distributions need to be computed in training a naïve Bayes classifier. Thus, no imputation was performed and all patients were included in the training phase, since not all missing data were considered for every feature. Furthermore, when applying the trained model to make inferences, the final user can insert missing data, still obtaining a reliable result.
The EHRs of 852 patients were randomly split into derivation (70%) and test (30%) cohorts. The derivation cohort was first used to select, among the considered 62 patient features, the most significant ones, and then to train the naïve Bayes classifier using only the best predictors, while the predictive ability of the estimated model was assessed on the test cohort.
The Piacenza score has been developed and tailored to (1) minimize the number of clinical variables to be ingested and (2) to maximize the overall prediction performance (ie, in terms of maximization of the area under the receiver operating characteristic curve [AUC]) and patient stratification ability. The most significant patient features were identified through the so-called forward-search approach [
The forward-search approach is a purely data-driven dimensionality reduction technique that is able to identify, given a large set of input features, the minimum combination of those features, which maximizes the performance metrics associated with a machine learning algorithm. The forward-search approach was employed here to reduce the number of patient variables from 62 to the 6 most relevant ones used to train the naïve Bayes classifier.
The test cohort was used to assess the performance of the Piacenza score. In order to increase the statistical significance of the results, bootstrapping was used to randomly generate 100 test sets from the original test cohort. Moreover, an external validation cohort has been considered to further validate the Piacenza score performance. The external validation cohort consisted of data from 86 patients with COVID-19 enrolled at Centro Cardiologico Monzino Hospital (Milan, Italy).
The performance of Piacenza score was evaluated in terms of discrimination and calibration capabilities. The discrimination ability was determined by computing the receiver operating characteristic (ROC) curve on the test cohort and the associated AUC, together with its 95% CI. As additional metrics, the negative predictive value (NPV), the positive predictive value (PPV), the accuracy, the sensitivity, the specificity, and the F1 and F2 scores were computed. These metrics were calculated for a threshold value obtained by maximizing the F2 score. The calibration ability was derived by the so-called calibration plots, which compare observed and predicted outcomes with associated uncertainties. The Brier index was used to evaluate the ability of machine learning to stratify and predict observed outcomes. The Brier index is defined as the mean-squared difference between the observed and predicted outcomes and ranges from 0 to 1, with 0 representing the best calibration.
Finally, the variable relative importance was quantified for the identified 6 most relevant patient features. The relative importance is a comparative measure of the patient feature’s weight in determining the Piacenza risk score.
The Piacenza score was specifically designed to be an easy, fast, versatile, fair, open, and user-friendly tool. To reach this goal, a web-based calculator of the score, via a website, was released [
We added a tailored version of the algorithm to the website, which enables an optimized computation of the mortality risk score for a patient even when some variables used by the Piacenza score are not available. In this case, the naïve Bayes classifier is retrained over the same derivation cohort but using a different set of patient characteristics. Moreover, a second naïve Bayes model has been presented as a possible example of the Piacenza score’s customization and flexibility. The above-mentioned model has been trained with the following 14 variables, chosen a priori by the physician for their association with mortality in COVID-19 pneumonia: age, gender, diabetes, length of symptoms before hospital admission, systolic blood pressure, respiratory rate, PaO2/FiO2 ratio, platelets and eosinophils count, neutrophil-to-lymphocyte ratio, C-reactive protein, direct bilirubin, creatinine, and lactate dehydrogenase. Finally, we compared the performance of the Piacenza score with the above-mentioned “clinical” naïve Bayes classifier to show the flexibility of the method, which can be easily retrained with another subset of predictors.
The website has been developed in Python (Python Software Foundation), using the Flask framework, and Hosting is managed through Docker.
The site consists of three main pages:
On the
To further assess the performance of the Piacenza score, we compared it with the 4C mortality score, which considers the following predictors: age, gender, number of comorbidities, respiratory rate, peripheral oxygen saturation (sO2), level of consciousness (Glasgow coma scale), urea level, and C-reactive protein. The same test cohort used to test the Piacenza score was employed.
Categorical variables were reported as count (%) and continuous variables as mean (SD). A two-sided
A total of 852 patients with SARS-CoV-2 pneumonia were hospitalized during the study period, of which 242 (28%) were admitted to the intensive care unit (ICU). The mean age of the patients was 70 (SD 14) years, and 599 (70%) were male. Comorbidities were present in 602 patients (71%): mainly arterial hypertension (n=499, 59%), dyslipidemia (n=205, 24%), and diabetes (n=157, 18%). The mean time between onset of symptoms and hospital admission was 6.5 (SD 3.9) days. Fever (n=776, 91%), dyspnea (n=543, 64%), and cough (n=400, 47%) were the most common symptoms at admission. A total of 293 patients (34%) died within 30 days after hospital admission. The median time from hospital admission to discharge or death was 9 days. A comparison of clinical characteristics between survivors and nonsurvivors showed that the latter were older
Study population characteristics and a comparison of survivors and nonsurvivors.
Characteristic | All patients (N=852) | Patients discharged alive (n=559) | Deceased patients (n=293) | |||
Gender (male), n (%) | 599 (70) | 386 (69) | 213 (73) | .30 | ||
Age (years), mean (SD) | 70 (14) | 65 (14) | 78 (10) |
|
||
|
602 (71) | 364 (65) | 238 (81) |
|
||
|
Hypertension | 499 (59) | 294 (53) | 205 (70) |
|
|
|
Atrial fibrillation | 109 (13) | 58 (10) | 51 (17) |
|
|
|
Chronic obstructive pulmonary disease | 130 (15) | 76 (14) | 54 (18) | .07 | |
|
Dyslipidemia | 205 (24) | 132 (24) | 73 (25) | .67 | |
|
Chronic kidney disease | 75 (9) | 42 (8) | 33 (11) | .07 | |
|
Diabetes | 157 (18) | 90 (16) | 67 (23) |
|
|
|
Cancer | 65 (8) | 38 (7) | 27 (9) | .22 | |
|
Stroke | 28 (3) | 9 (2) | 19 (6) |
|
|
|
Peripheral artery disease | 19 (2) | 10 (2) | 9 (3) | .23 | |
|
Coronary artery disease | 96 (11) | 58 (10) | 38 (13) | .26 | |
|
|
|||||
|
Time from symptom onset to admission, mean (SD) | 6.54 (3.94) | 6.71 (3.79) | 6.27 (4.16) |
|
|
|
Fever, n (%) | 776 (91) | 513(92) | 263(90) | .32 | |
|
Dyspnea, n (%) | 543 (64) | 317(57) | 225(77) |
|
|
|
Cough, n (%) | 400 (47) | 280 (50) | 120 (41) | .18 | |
|
Fatigue, n (%) | 174 (20) | 118 (21) | 56 (19) | .32 | |
|
Diarrhea, n (%) | 77 (9) | 66 (12) | 11(4) | .05 | |
|
Syncope, n (%) | 43 (5) | 36 (6.5) | 7 (2) | .18 | |
|
|
|||||
|
PaO2/FiO2 ratio | 225.93 (96.34) | 270.54 (83.82) | 196.54 (92.70) |
|
|
|
pH | 7.45 (0.07) | 7.46 (0.07) | 7.45 (0.07) | .35 | |
|
PaO2 | 60.16 (18.58) | 59.68 (15.94) | 60.56 (20.54) | .71 | |
|
PaCO2 | 35.75 (10.37) | 35.36 (8.52) | 36.05 (11.58) | .62 | |
|
HCO3 | 25.43 (6.78) | 26.22 (9.12) | 24.81 (3.97) | .23 |
a
Major laboratory markers were tracked upon admission. Specifically, lactate dehydrogenase, creatine kinase, cholinesterase, creatinine, and glycemia were significantly higher in nonsurvivors than survivors (
Laboratory findings upon admission for the overall study sample and a comparison of survivors and nonsurvivors.
Laboratory parameter | All patients (N=852), mean (SD) | Patients discharged alive (n=559), mean (SD) | Deceased patients (n=293), mean (SD) | |
Glucose (mg/dl) | 145 (66) | 137 (59) | 159 (76) |
|
Urea (mg/dl) | 57 (40) | 47 (24) | 76 (54) |
|
Creatinine (mg/dl) | 1.24 (0.90) | 1.06 (0.54) | 1.59 (1.27) |
|
Sodium (mEq/l) | 137 (8) | 137 (8) | 137 (7) | .24 |
Potassium (mEq/l) | 4.17 (0.55) | 4.14 (0.49) | 4.24 (0.65) |
|
Chloride (mEq/l) | 99.26 (7.21) | 98.84 (7.19) | 100.05 (7.17) |
|
Total bilirubin (mg/dl) | 0.75 (0.48) | 0.72 (0.35) | 0.82 (0.66) |
|
Direct bilirubin (mg/dl) | 0.22 (0.60) | 0.21 (0.69) | 0.25 (0.37) | .31 |
ASTb (U/L) | 61 (84) | 53 (37) | 79 (136) |
|
ALTc (U/L) | 48 (70) | 47 (44) | 48 (103) | .90 |
LDHd (U/L) | 430 (220) | 391 (160) | 509 (292) |
|
Creatine kinase (U/L) | 300 (637) | 231 (387) | 429 (932) |
|
Amylase (U/L) | 73 (48) | 69 (37) | 80 (63) |
|
Lipase (U/L) | 47 (72) | 43 (46) | 56 (105) | .06 |
Serum cholinesterase (U/L) | 6275 (1858) | 6674 (1763) | 5576 (1812) |
|
WBCe × 103/µl | 8.12 (4.68) | 7.86 (4.72) | 8.63 (4.56) |
|
RBCf × 106/µl | 4.69 (0.72) | 4.79 (0.68) | 4.51 (0.77) |
|
Hemoglobin (g/dl) | 13.59 (1.91) | 13.83 (1.72) | 13.14 (2.16) |
|
Hematocrit (%) | 41.84 (5.70) | 42.37 (5.34) | 40.83 (6.22) |
|
MCVg (fl) | 89.74 (6.66) | 89.18 (5.62) | 90.80 (8.19) |
|
MCHh (pg) | 29.13 (2.38) | 29.05 (2.12) | 29.28 (2.80) | .23 |
MCHCi (g/dl) | 32.43 (1.36) | 32.56 (1.15) | 32.17 (1.66) |
|
Platelets × 103/µl | 217.75 (117.90) | 221.08 (127.10) | 211.41 (97.72) | .22 |
RDWj (%) | 13.65 (1.65) | 13.27 (0.27) | 14.29 (1.99) |
|
Neutrophils (%) | 77.45 (11.57) | 75.81 (11.75) | 80.56 (10.55) |
|
Lymphocytes (%) | 15.17 (9.20) | 16.48 (9.45) | 12.67 (8.15) |
|
Monocytes (%) | 6.89 (4.30) | 7.16 (4.01) | 6.36 (4.76) |
|
Eosinophils (%) | 0.32 (0.91) | 0.38 (1.05) | 0.20 (0.54) |
|
Lymphocytes × 103/µl | 1.09 (0.99) | 1.15 (0.94) | 0.98 (1.09) |
|
Monocytes × 103/µl | 0.51 (0.41) | 0.52 (0.35) | 0.51 (0.51) | .77 |
Eosinophils × 103/µl | 0.02 (0.07) | 0.03 (0.08) | 0.02 (0.05) |
|
Neutrophils × 103/µl | 6.41 (3.72) | 6.05 (3.41) | 7.11 (4.15) |
|
PTk (seconds) | 15.84 (8.38) | 15.07 (5.83) | 17.03 (11.11) |
|
Prothrombin activity (%) | 68.40 (15.96) | 69.86 (14.38) | 66.27 (17.82) |
|
INRl | 1.40 (0.76) | 1.34 (0.65) | 1.51 (0.93) |
|
PTTm (seconds) | 31.70 (5.74) | 31.32 (4.48) | 32.29 (7.22) | .08 |
PTT ratio | 1.02 (0.19) | 1.00 (0.14) | 1.04 (0.25) | .06 |
C-reactive protein (mg/dl) | 11.19 (8.55) | 9.85 (7.88) | 13.74 (9.17) |
|
NLRn | 7.99 (6.74) | 6.78 (5.04) | 10.27 (8.68) |
|
a
bAST: aspartate aminotransferase.
cALT: alanine aminotransferase.
dLDH: lactate dehydrogenase.
eWBC: white blood cell count.
fRBC: red blood cell count.
gMCV: mean corpuscular volume.
hMCH: mean corpuscular hemoglobin.
iMCHC: mean corpuscular hemoglobin concentration.
jRDW: red cell distribution width.
kPT: prothrombin time.
lINR: international normalized ratio.
mPTT: partial thromboplastin time.
nNLR: neutrophil-to-lymphocyte ratio.
Using the forward-search algorithm, the following 6 most important predictors at hospital admission were identified and used to compute the Piacenza score: age, MCHC, PaO2/FiO2 ratio, temperature, previous cerebrovascular stroke, and gender.
The median of the ROC curve over 100 test cohorts (generated through bootstrapping) is reported in
The calibration plot of the Piacenza score over the range of risk showed a Brier score of 0.19. The risk deciles are grouped into three levels: low risk (first to fifth deciles), intermediate risk (sixth to eighth deciles), and high risk (ninth and tenth deciles). A gradual and progressive increase in absolute event rates was observed across risk classes for all the Piacenza scores (death: 14% [18/125] in low-risk deciles vs 36% [27/75] in intermediate-risk deciles vs 66% [33/50] in high-risk deciles).
(A) Receiver operating characteristic (ROC) curves obtained by evaluating the Piacenza score (red curve) on the test cohort and on the external validation cohort. (B) ROC curves obtained by evaluating the Piacenza score (red curve) and the naïve Bayes (NB) model trained with 14 manually chosen features (green curve). AUC: area under the ROC curve.
Negative predictive value (NPV), positive predictive value (PPV; or precision), accuracy, sensitivity (or recall), specificity, F1 score, and F2 score for all scores. These metrics have been calculated for a specific threshold value on the final risk score probability chosen by maximizing the F2 score, the reason being that F2 privileges a high recall and therefore a broader confidence for correctly identifying patients at risk.
Scores | Threshold | NPV | PPV | Accuracy | Sensitivity | Specificity | F1 score | F2 score |
Piacenza score | 0.16 | 0.93 | 0.40 | 0.55 | 0.94 | 0.37 | 0.56 | 0.74 |
Piacenza score–external validation | 0.16 | 0.97 | 0.37 | 0.57 | 0.95 | 0.44 | 0.53 | 0.72 |
Naïve Bayes model trained with 14 manually chosen features | 0.04 | 0.88 | 0.54 | 0.67 | 0.88 | 0.55 | 0.67 | 0.78 |
4C mortality score | 0.12 | 0.98 | 0.39 | 0.53 | 0.99 | 0.34 | 0.56 | 0.76 |
From the computed calibration plot, we can observe that the mortality risk is underestimated only in the first few deciles, while in the higher deciles the risk is slightly overestimated (
Regarding the relative importance of each features independent from the others, age was the most important feature to predict death followed by MCHC, PaO2/FiO2 ratio, previous cerebrovascular stroke, gender, and temperature (
Risk of observed death according to deciles of event probability based on the Piacenza score (A), the Piacenza score on the external validation data set (B), and the naïve Bayes (NB) model trained with 14 manually chosen features (C). For every single case, the corresponding calibration plots with standard deviations calculated over the deciles are also shown below each respective graph (D, E, and F).
Radar plot for the 6 Piacenza score predictors of death and for the 14 manually chosen features, showing their relative importance. Feature importance is scaled with respect to the most important feature. NB: naïve Bayes, MCHC: mean corpuscular hemoglobin concentration, CRP: C-reactive protein, LDH: lactate dehydrogenase, NLR: neutrophil-to-lymphocyte ratio, P/F: PaO2/FiO2, RR: respiratory rate, SBP: systolic blood pressure.
The corresponding median of the AUC in the external validation cohort was 0.79 (95% CI 0.68-0.89) with a Brier score of 0.16 (
The calibration plot is reported in
The median of the AUC was 0.78 (95% CI 0.73-0.83) with a sensitivity of 99% and specificity of 34% for the 4C score when evaluated on the test cohort. The corresponding Brier score was equal to 0.26 (
Performance of the 4C mortality score (both in terms of discrimination and calibration abilities) calculated on the test cohort. ROC: receiver operating characteristic, AUC: area under the ROC curve.
The observed mortality increased gradually and progressively for the naïve Bayes model with manually chosen features—death: 14% (17/125) in low-risk deciles vs 32% (14/75) in intermediate-risk deciles vs 72% (36/50) in high-risk deciles. This was not observed for the 4C score—death: 33% (41/125) in low-risk deciles vs 31% (23/75) in intermediate-risk deciles vs 36% (18/50) in high-risk deciles. Both scores achieved a satisfactory patient stratification only in the last three deciles whereas the 4C mortality score overestimated the prediction in the high-risk deciles and underestimated it in the low-risk ones (
In this study, we developed and validated a machine learning–based risk score—the Piacenza score—to predict mortality risk among hospitalized patients with COVID-19 pneumonia. This score is based on only 6 variables that are readily available at hospital admission.
Satisfactory performance, measured in terms of AUCs in both the testing and external validation cohorts, was achieved with excellent patient stratification. More specifically, the Piacenza score showed a higher sensitivity with a lower specificity. Likewise, it underestimated the mortality risk in the first three risk deciles; slight overestimation occurred in the other deciles. This behavior is acceptable and preferred in an acute setting since the score has been designed as a screening predictive tool capable of correctly identifying patients at low risk from those at high risk of mortality.
In crowded hospitals, and with shortages of medical resources, this simple model can help to quickly prioritize patients: if the patient’s estimated risk is low, the clinician may choose to monitor the patient, whereas a high-risk estimate might support aggressive treatment or admission to the ICU. Data from China, Europe, and the United States reported a hospitalization rate of 20% to 31%, an ICU admission rates from 17% to 35%, and an in-hospital mortality rate between 15% and 40% [
In the presence of a large number of patients requiring intensive care and threatening to overwhelm health care systems around the world, several models to predict survival and guide clinical decisions in COVID-19 pneumonia were developed [
The recent spread of artificial intelligence has brought novel ways to combat current global pandemics by collecting and analyzing large amounts of data, identifying trends, stratifying patients on the basis of risk, and proposing solutions at the population level instead of at the single individual level [
During the COVID-19 pandemic, machine learning approaches have been used to predict the outbreak, to diagnose the disease, to analyze chest x-ray and CT (computed tomography) scan images, and more recently to predict mortality or progression risk to severe respiratory failure [
Yuan and colleagues [
The 4C mortality score, developed and validated by the International Severe Acute Respiratory and Emerging Infections Consortium, based on 8 clinical and laboratory variables, achieved an AUC of 0.78 in predicting mortality. It is easy to use and has a pragmatic design. In fact, to calculate the score, no external tool or complex mathematical equation is required, and results can be immediately retrieved at the bedside [
The performance of our model is comparable with the 4C mortality score applied to the test cohort used in this paper. However, we remark that the 4C mortality score was derived based on a population of 35,000 patients, while the naïve model providing the Piacenza score was trained using information coming only from 852 patients. This is indicative of the high representativeness of the training cohort considered in our study. Furthermore, although there is a similar discriminative power between the 4C score and the Piacenza score, the latter score showed better performance in stratifying patients according to their mortality risk, which is of paramount importance in selecting the appropriate treatment and for resource allocations. We also externally tested our score, achieving good performance and confirming that our data-driven model is robust despite its reliance on variables deemed relevant in this context without actually knowing their semantics.
The Piacenza score contains parameters reflecting patient demographics, comorbidity, and physiology at hospital admission. It shares some characteristics with the 4C score such as age, gender, comorbidities, and PaO2/FiO2 but also includes unexplored features like temperature and MCHC deriving from a substantially different selection of variables. Unlike traditional scores based on logistic regression analysis mixed with a knowledge-driven approach where a score is assigned by an expert to each of the limited number of selected variables, the proposed predictive model is purely data driven and is not affected by a clinically oriented, potentially biased choice of variables [
The Piacenza score is highly customizable and can be adapted as more information becomes available on disease progression and the impact of interventions like vaccines and new pharmacological treatments. In fact, the naïve Bayes algorithm, during its learning phase, generates a summary of the data set where each variable is associated with the outcome in terms of a probabilistic dependence. This summary describes the distribution of the current data set and can be quickly and easily updated when a new observation is available, adapting itself to changes within the population. Likewise, if new data are available, they can be used to train a new version of the Piacenza score and study the possible fingerprints of COVID-19 variants.
The Piacenza score is thus highly flexible; if the some of the required variables are missing, the model can be retrained and the physician can still receive a customized result (associated with the best possible accuracy with respect to the variables provided). The retraining process can take up to 10 hours, depending on the number of features inserted. However, depending on future requests, codes can be easily optimized and run on more powerful hardware.
An example of a personalized model different from the Piacenza score is the naïve Bayes model trained with 14 manually chosen features, which showed a predictive power comparable to that of the Piacenza score. Other models differ in performance; however, as demonstrated, the variables age and PaO2/FiO2 ratio have the biggest contribution to the predictive power of the model. Therefore, starting with age and the PaO2/FiO2 ratio and adding more variables will lead to predictive performances similar to that of the Piacenza score, which represents the best combination for stratifying patients and predicting mortality.
Finally, our score’s predictors were not chosen a priori (like, for example, the 4C mortality score) but as the product of a machine learning–based optimization technique, which considers the smallest possible subset of leading predictors associated with the best possible performance.
The approach proposed in our paper is suitable for risk stratification and mortality assessment of other conditions as well, such as heart failure (HF), which constitutes a growing public health issue. In fact, although machine learning has made significant contributions to health care in just a few years, little evidence exists on the role of machine learning in predicting mortality in patients with HF and in general with cardiovascular diseases. In this context, several researchers have developed prognostic risk scores for HF such as the Seattle Heart Failure Model and the Meta-Analysis Global Group in Chronic Heart Failure [
This study has room for further improvement, which is left for future work. First, given that the proposed machine learning method is purely data driven, our model may vary if a different data set is used. As more data become available, the model can be refined and performance of the Piacenza score can further increase. To this aim, we are currently looking forward to subsequent large-sample and multicentered studies. Second, the forward-selection algorithm (used to select the Piacenza score predictors and most importantly to personalize the Piacenza score on any other subset of features) may be an expensive option to be considered and may surely be optimized in further versions of the code. Finally, new variables such as d-dimer and troponin, currently not available, but which are known to be associated with a higher mortality risk in cases of COVID-19 pneumonia may be included in future analyses.
In conclusion, we have developed and validated robust machine learning models, which could be used to predict the prognosis of patients with COVID-19. The Piacenza score has several advantages: first, it relies on objective clinical and laboratory measurements not affected by human interpretation; second, it was tested and validated in patients belonging to the second wave of the pandemic; third, it is automatically generated through a combination of variables widely available at hospital admission and can be calculated through a user-friendly web interface; and finally, as opposed to traditional epidemiological predictive models, the Piacenza score has the added advantage of adaptive learning, trend-based recalibration, and flexibility.
area under the receiver operating characteristic curve
comma-separated values
computed tomography
electronic health record
heart failure
intensive care unit
mean corpuscular hemoglobin concentration
negative predictive value
positive predictive value
receiver operating characteristic
reverse transcriptase–polymerase chain reaction
No sponsor had any role in the study design, data collection, data analysis, data interpretation, or writing of the paper.
GH, DP, MV, and MAD conceived the study. A Biagi, LR, A Botti, CM, AN, MM, and ES collected the data. MS, UM, AN, and DP managed and analyzed the data. MM, FP, LR, and EM developed the website. PA and MN provided clinical expertise. MP supervised the work. All authors interpreted the results. GH, MAD, MV, and DP wrote the manuscript, which was approved by all the authors.
None declared.