Predictability of COVID-19 Hospitalizations, Intensive Care Unit Admissions, and Respiratory Assistance in Portugal: Longitudinal Cohort Study

Background: In the face of the current COVID-19 pandemic, the timely prediction of upcoming medical needs for infected individuals enables better and quicker care provision when necessary and management decisions within health care systems. Objective: This work aims to predict the medical needs (hospitalizations, intensive care unit admissions, and respiratory assistance) and survivability of individuals testing positive for SARS-CoV-2 infection in Portugal. Methods: A retrospective cohort of 38,545 infected individuals during 2020 was used. Predictions of medical needs were performed using state-of-the-art machine learning approaches at various stages of a patient’s cycle, namely, at testing (prehospitalization), at posthospitalization, and during postintensive care. A thorough optimization of state-of-the-art predictors was undertaken to assess the ability to anticipate medical needs and infection outcomes using demographic and comorbidity variables, as well as dates associated with symptom onset, testing, and hospitalization. Results: For the target cohort, 75% of hospitalization needs could be identified at the time of testing for SARS-CoV-2 infection. Over 60% of respiratory needs could be identified at the time of hospitalization. Both predictions had >50% precision. Conclusions: The conducted study pinpoints the relevance of the proposed predictive models as good candidates to support medical decisions in the Portuguese population, including both monitoring and in-hospital care decisions. A clinical decision support system is further provided to this end. (J Med Internet Res 2021;23(4):e26075) doi: 10.2196/26075


Introduction
Background COVID-19 is a disease caused by the novel coronavirus SARS-CoV-2, transmissible from person to person and associated with acute respiratory complications in severe cases [1,2]. The main symptoms of patients infected are fever, cough, and fatigue; others are asymptomatic [3]. The COVID-19 pandemic presents a substantial threat to global health and has been directly responsible for many deaths. Since the first outbreak in December 2019 in Wuhan, China, the number of confirmed infected patients worldwide has exceeded 55 million cases, and nearly 1.3 million people have died from COVID-19 [4]. Current literature has shown that infected patients with specific comorbidities or preconditions (eg, hypertension, respiratory problems, diabetes) and of old age are expected to develop a more severe response to the infection and may consequently need longer hospitalizations and intensive care [5][6][7]. Strict social confinement measures have been implemented to decrease the COVID-19 R 0 value (average number of individuals infected by each infected person) and guarantee the optimal use of equipment and beds at normal, continuous, and intensive care units (ICUs). However, although public health responses aim to delay the spread of the infection, several countries such as the United States, Brazil, Italy, and India have faced severe health care crises.
Without effective antiviral drugs and a vaccine, prognostic tools related to COVID-19 are required. Statistical and computational models could assist clinical staff in triaging patients at high risk for respiratory failure to better guide the allocation of medical resources. Recently, several predictive models ranging from statistical and score-based systems to more recent machine learning models have been proposed in response to COVID-19. Guan et al [8] proposed a Cox regression model to infer potential risk factors associated with serious adverse outcomes in patients with COVID-19. Univariate and multivariate logistic regression models have been used to determine risk factors associated with mortality [9]. Scoring systems have been proposed to predict COVID-19 patient mortality but are limited by small sample sizes, with a poor discriminatory ability [10][11][12]. Other statistical approaches have also been emerging to aid prognostics [13,14]. Complementarily, machine learning methods offer the possibility to model more complex data relationships, generally yielding powerful capabilities to predict outcomes of infectious and noninfectious diseases in medical practice [15][16][17]. To this end, classification and regression models have been proposed for risk stratification of patients and to screen the spread of COVID-19 [18][19][20]. Despite the inherent potentialities of ongoing efforts, studies in the context of COVID-19 are limited by either the size of available cohorts or the lack of a systematic comparison of different models [21][22][23][24], and generally neglect the predictability of medical needs (instead the focus is commonly placed on measurable disease factors, early detection of infection, and mortality risk prediction [25][26][27][28]). None of these studies have comprehensively targeted the Portuguese population at the present time.

This Study
This study provides a structured view on the predictability of hospitalizations, ICU admissions, respiratory assistance needs, and survivability outcomes using a retrospective cohort encompassing individuals with a SARS-CoV-2-positive result in Portugal as of June 30, 2020.
To this end, and considering demographic, comorbidity, and care provision variables collected for the infected individuals, an assessment methodology was conducted, whereby state-of-the-art predictive models were hyperparameterized and robustly evaluated in order to assess the upper bounds on the predictive performance for each one of the targeted variables. In addition, whenever applicable, this analysis was extended toward the various stages of a patient's cycle: prehospitalization (at the time of testing), after hospitalization, and after ICU admission.
This study offers a solid methodology for the robust assessment of the predictability guarantees of future care needs of infected individuals, contrasting with the dominant correlation-based guarantees in literature. As comparable studies demonstrated in other populations, it lays a solid ground to compare type-I and type-II predictive errors and assess population-wise differences.

Overview
Complete subpopulations from the target cohort were identified for each output (Figure 1), guaranteeing the presence of all individuals undertaking the target forms of care (hospitalization, ICU admission, respiratory support) with a recovery-or-death outcome.
After the sampling and data curation steps (Figure 1), we proceeded to the optimization of data preprocessing options and classifiers' parameterization for each of the target variables separately. To this end, we applied a nested 10-fold cross-validation assessment methodology, whereby we first create train-test partitions (outer cross-validation) to assess the performance of an optimized classification method, and within each training fold we further create train-test partitions (inner cross-validation) for hyperparameterizing the predictive model under assessment. This methodology guarantees that all observations are used to assess the final performance and prevents biases as hyperparameterization takes place within each training folds.
Within each inner train-test fold, Bayesian optimization [29] was applied to find the hyperparameters that best fit the pipeline. The optimization measures are: • F1 score and 0.7 × recall + 0.3 × precision for binary classes. These two views generate two sets of classifiers: one that equally weights recall-and-precision views, and other that, similar to the F2 score (F β , where β=2), prioritizes the optimization of the true-positive rate (recall) at the cost of a lower positive predictive value (precision); • Cohen kappa and average class recall for target variables with more than 2 classes (respiratory support). Hospitalization, UCI admission, respiratory support, and recovery-or-death outcomes for SARS-CoV-2-infected individuals are considerably imbalanced, hence, the relevance of the placed recall-precision and multiclass recall views. In particular, considering both a balanced recall-precision optimization and recall-oriented optimization is relevant for clinical decisions. When the allocated teams have capacity to remotely monitor SARS-CoV-2-infected patients, the predictive models optimized with a schema that prioritizes recall should be pursued to guarantee that no vulnerable patient is left out. Nevertheless, when monitoring capacity is limited, greater attention to precision is necessary, and only more vulnerable patients (as suggested by the predictive models optimized with balanced recall-precision views) should be attemptively monitored.
The allowed preprocessing options are as follows: imputation of missing values using median-mode imputation, KNNImputer, or none; class balancing using subsampling, oversampling, SMOTE (Synthetic Minority Oversampling Technique), or none; and normalization of real-valued variables using standardization, scaling, or none. The selected classifiers are as follows: Bernoulli naive Bayes, Gaussian naive Bayes, k-nearest neighbors (KNN), decision tree (DT), random forest, XGBoost (XGB), logistic regression, Light Gradient Boosting Machine (LightGBM), Super Learner, and multilayer perceptron (MLP). Super Learner uses folding to hyperparameterize models and selects predictors for out-of-fold predictions from individual performance estimates per fold. In this context, Super Learner's performance is generally coincident with the best predictive model and thus not always disclosed in the Results section to allow the identification of the best underlying predictors. We considered the implementations provided in the scikit-learn [30] and xgboost [31] packages in Python (Python Software Foundation). For each classifier, all supported parameters in scikit-learn were subjected to hyperparameterization. Regarding the MLP, we placed upper limits on the number of hidden layers (3) and nodes per layer (20) given the low-dimensionality nature of the target data set. The hyperparameters were subjected to a total of 50 iterations. Multimedia Appendix 1 displays the optimized parameters for the best-performing predictive models per outcome.
Differences in performance from the paired-error estimates collected per fold were statistically tested using t tests when estimates passed the Shapiro-Wilk normality test. When this condition was not satisfied, Wilcoxon signed-rank tests were applied.
In addition to the conducted analysis, the best predictors trained on the whole data set were made available within a clinical decision support system built using flask technology and dash facilities in Python [32], which can run as an offline web application.

Data Source
A retrospective cohort (from March 1 to June 30, 2020) of patients with confirmed COVID-19 in Portugal was used in this study. The anonymized data set was provided by the Directorate General of Health (Direcção-Geral da Saúde, DGS), the Portuguese health authority. The gathered data, called the covid19-DGS database, contains information pertaining to the demographic and clinical patient characteristics as well as preexisting conditions. Data are available upon reasonable request.

Ethical Considerations
The COVID-19 data set is provided by the DGS under the collaborative score4COVID research project proposal. The tasks conducted in the score4COVID project were further validated by the Ethical Committee of the NOVA School of Science and Technology.

Results
Results on the predictability of hospitalization needs, ICU admissions, respiratory assistance, and outcome of infected individuals living in Portugal, as of June 30, 2020, are discussed below.

Cohort Characteristics
The target cohort comprised 38,545 individuals who were SARS-CoV-2 positive: 17,046 recoveries (SARS-CoV-2 negative after positive testing) and 1155 deaths. Four individuals were excluded from the data set due to inconsistent recordings related to age and pregnancy-gender variables. Table 1 provides essential statistics. Figure 2 further describes sex and age distributions in deaths, hospitalizations, ICU admissions, and average number of days from symptom onset (traced by the public health line for COVID-19) to a positive test result and hospitalization.
Within the target population, there were 4326 hospitalizations (11.2% of population base) and 253 admissions to the ICU (5.8% of hospitalizations). Among ICU internments, there were 82 recoveries and 61 deaths. In terms of respiratory support, a total of 180 individuals undertook assisted ventilation, 292 submitted to oxygen therapy, and 9 underwent alternative modes of respiratory support such as extracorporeal membrane oxygenation.
The major classes of comorbidities monitored were neoplasm, diabetes, asthma, pulmonary, hepatic, hematological, renal, neurological, neuromuscular, and immune deficiency conditions. The representativity of individuals with one or more comorbidities, as well as their impact on survivability, is depicted in Figure 3.   Hospitalization Figure 4 and Table 2 provide results pertaining to the models' ability to predict the need for individuals to be hospitalized once they are tested as SARS-CoV-2 positive given their (1) demographic group (age and gender) and (2) comorbidity factors. Comorbidity factors were categorized in accordance with the presence or absence of kidney, asthma, lung, cancer, neuromuscular, diabetes, HIV, cardiac, and pregnancy conditions. Nonhospitalized individuals without a clear outcome (recovery or death) were excluded from this analysis. Figure 5 provides the receiver operating characteristic curve per predictor for each optimization setting.
Generally, we observed that nearly 90% of hospitalization needs could be identified at the time of SARS-CoV-2 testing. This level of recall/sensitivity was observed at the expense of an approximate 55% precision, meaning that more than half of the predicted hospitalization needs were in fact observed. Logistic regression and MLP were the best-performing classification models according to F1-score and recall, respectively. Statistical superiority was verified for logistic regression but not MLP against peer models (at =.05). These results provide empirical evidence toward the role of these predictors in supporting individual remote monitoring decisions.    Table 3 assess the ability to anticipate intensive care needs for infected individuals at two stages: before hospitalization and after hospitalization. To this end, the proposed methodology was pursued considering demographic factors, comorbidity factors, and the time to hospitalization for hospitalized individuals. Individuals without a SARS-CoV-2-negative test result after infection were excluded.    The predictability of ICU needs is less satisfactory than hospitalization needs, particularly for the prehospitalization stage. We hypothesize that this difficulty was partially related to the smaller number of individuals with ICU internments, together with the presence of missing values associated with ICU internment needs for most individuals. Even though we can achieve recall levels over 90% with gradient boosting (XGBoost) in a posthospitalization setting, it comes at the cost of a considerably low precision (with one-third of predictions seen in practice). Still, the best-performing predictive models are suggested to support monitoring decisions at the hospital bedside, as their recall and specificity are considerably high. Figure 8 and Table 4 assess respiratory assistance needs for hospitalized individuals with SARS-CoV-2, considering three assistance modes: (1) ventilation support, (2) oxygen therapy, and (3) combined ventilation and oxygen therapies. Demographic, comorbidity, and time-to-hospitalization factors were used as input variables.

Respiratory Support
Individuals without a SARS-CoV-2-negative test result after infection were excluded from this analysis. As respiratory support is a multiclass variable, we considered a different performance evaluation by focusing on (1) the recall for each major class (ventilation, oxygen, and nonrequired support), (2) the precision of individuals with oxygen or ventilation assistance, and (3) the Cohen kappa coefficient.
XGBoost, LightGBM, and random forests attained a satisfactory identification of hospitalized individuals who may require respiratory support in the future, generally providing recalls for each assistance mode around 60% at the cost of a 40% precision. According to the conducted methodology, they are thus pinpointed as good candidates to support in-hospital care decisions.

Survivability (Outcome)
Finally, Figures 9 and 10 and Table 5 provide an analysis of the ability of the models to predict recovery-or-death outcomes for individuals with SARS-CoV-2 infection at three time points: (1) before hospitalization (at the time of testing), (2) after hospitalization, and (3) after ICU admission when applicable.
To this end, we preserved the input variables and validation methodology (see Methods section) considered in previous scenarios.
Our results showed a high ability to identify death outcomes. However, at the SARS-CoV-2 testing stage, this comes at a cost of incorrectly classifying two-thirds of individuals susceptible to death. In the posthospitalization scenario, we achieved more balanced results, with both precision and recall around 75% using gradient boosting (XGBoost and LightGBM). The introduction of the intensive care variable hampered the results since it restricted the analysis of deaths to individuals with acute needs and dependent on continuous care instruments.

Determinants of Predictability
To assess the determinant factors underlying the achieved predictability levels, we first statistically tested the correlation between input and output variables using chi-square tests, ANOVA (analysis of variance), and their nonparametric counterparts, yielding results similar to those by Nogueira et al [33]. For a more in-depth understanding of the feature relevance for the assessed predictive models, Figures 11 and 12 illustrate the importance of the top features. To this end, we considered relevance outputs from gradient boosting (XGBoost) due to its competitively high performance across all outcomes, as well as the logistic regression for the hospitalization outcome by computing the Wald statistic to assess the significance of the coefficients for predictions. We can observe that XGBoost distinguishes the relevance of different comorbidities for the target variables along each stage of the care process. In addition to the age variable, the onset period to hospitalization in days was also found to be a critical factor affecting the decisions ( Figure 12). The high relevance of this variable consistently had top rank among associative models-XGBoost, random forests, and decision trees-pinpointing the importance of its collection for computer-aided predictions of ICU internment and respiratory needs. Complementarily, Figure 13 offers additional insights into the target predictive tasks by plotting some of the characteristics of the correctly classified individuals against incorrectly classified individuals with XGBoost. Particular attention should be paid to the differences between true positives and false negatives, that is, to the individuals requiring care, in order to guarantee their timely and proper assistance. The susceptibility to false negatives is higher for individuals within the 40-60-year age category and without comorbidities.

Clinical Decision Support System
The learned predictive models based on simple variables (stage, age, gender, and comorbidities) have been made available to health care providers within a recommendation system with graphical facilities. The serialized predictive models are used for the efficient testing of individuals at the different stages of the care cycle (testing, hospitalization, ICU admission) for the different outcome variables (care needs) after inserting essential demographic and comorbidity features. The output provides a bounded statistic based on the estimation returned by the predictive models achieving better recall and F1-measure for each outcome variable. Figure 14 provides a visualization of the graphical interface. The variables required for each outcome score calculation are usually available at hospitals, and the tool is easy to use. Although recommendations are provided within a statistical frame, the tool does not categorize the risk into low-or high-risk patients as clinical experts are more informed to approximate this risk. In addition, we advise caution for clinicians who intend to use this tool as a predictive guide, especially for survivability analysis. Clinicians must balance the predictions from this tool against their practical experience.
In collaboration with DGS, our predictors are expected to be provided within public hospitals and care contact centers of the Portuguese Health Service (Serviço Nacional de Saúde), particularly to support remote care monitoring decisions. The decision support system is available as a software tool on GitHub [34].

Principal Findings
This work offers a discussion on the predictability of hospitalization needs, ICU admissions, respiratory assistance, and survivability outcome in individuals infected with SARS-CoV-2 in Portugal as of June 30, 2020. A retrospective cohort with all confirmed COVID-19 cases since March, encompassing demographic and comorbidity variables, was considered as the target population in this study.
The results for the given cohort reveal that (1) over 75% of hospitalization needs can be identified at the time of SARS-CoV-2 testing (with >50% precision); (2) ICU needs are generally less predictable at both the pre-and posthospitalization stages in the given cohort; (3) respiratory assistance needs (including ventilation support, oxygen therapy, and combined ventilation-oxygen support) achieved recall levels above 60% (with >50% precision); (4) death risk along different stages (testing time, after hospitalization, and after ICU admission) had the highest degree of predictability.
The predictive models yielding better accuracy performance were associative classifiers, particularly XGBoost and RandomForests, neural networks with hyperparameterized architectures, and logistic regressors, with the optimal choice varying in accordance with the target variable and evaluation measure.
Publications on COVID-19 using machine learning models for different outcomes have been rapidly increasing. Gao et al [35] developed a model that includes the mortality risk prediction and reported an F1 ranging from 0.65 to 0.69 (κ=0.61-0.65), in line with our findings. Alternative studies [28,36,37] offer additional results for generalizing results and identifying population-specific differences. Yet, most of these studies do not comprehensively assess models' performance or the cohort characteristics, impeding solid cross-population findings.

Limitations
This study has some inherent shortcomings that should be noted: (1) the number of clinical variables for the outcomes of interest were limited (eg, BMI and clinical symptoms were missing); (2) further external validation of the selected models is required; and (3) although some inconsistencies (listed in the Cohort Description section) and missing/unknown entries in the original DGS data set were excluded, data acquisition problems may still persist and influence the outcomes of this work. The fully autonomous and parameter-free nature of the proposed computational approach/models allows it to be dynamically retrained with updated data.

Concluding Remarks
In this work, we developed a web-based clinical decision support tool without biological variables as input that can be used by clinicians. The conducted work pinpoints the relevance of the proposed predictive models to aid medical decisions for the Portuguese population, including both remote monitoring and in-hospital care decisions. Predicting the most probable outcomes along the life cycle of a SARS-CoV-2-infected individual can identify patients who are expected to develop severe illness, thus optimizing the allocation of health care resources and supporting more vulnerable patients.