This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Early detection and intervention are the key factors for improving outcomes in patients with COVID-19.
The objective of this observational longitudinal study was to identify nonoverlapping severity subgroups (ie, clusters) among patients with COVID-19, based exclusively on clinical data and standard laboratory tests obtained during patient assessment in the emergency department.
We applied unsupervised machine learning to a data set of 853 patients with COVID-19 from the HM group of hospitals (HM Hospitales) in Madrid, Spain. Age and sex were not considered while building the clusters, as these variables could introduce biases in machine learning algorithms and raise ethical implications or enable discrimination in triage protocols.
From 850 clinical and laboratory variables, four tests—the serum levels of aspartate transaminase (AST), lactate dehydrogenase (LDH), C-reactive protein (CRP), and the number of neutrophils—were enough to segregate the entire patient pool into three separate clusters. Further, the percentage of monocytes and lymphocytes and the levels of alanine transaminase (ALT) distinguished cluster 3 patients from the other two clusters. The highest proportion of deceased patients; the highest levels of AST, ALT, LDH, and CRP; the highest number of neutrophils; and the lowest percentages of monocytes and lymphocytes characterized cluster 1. Cluster 2 included a lower proportion of deceased patients and intermediate levels of the previous laboratory tests. The lowest proportion of deceased patients; the lowest levels of AST, ALT, LDH, and CRP; the lowest number of neutrophils; and the highest percentages of monocytes and lymphocytes characterized cluster 3.
A few standard laboratory tests, deemed available in all emergency departments, have shown good discriminative power for the characterization of severity subgroups among patients with COVID-19.
The COVID-19 pandemic has brought to light the scarcity of health care resources worldwide [
The exhaustive and inefficiently structured amount of health data available does not permit parametric modeling in an easy way. To overcome this issue, machine learning techniques have recently been identified as promising tools in data analysis for individual class prediction, allowing us to deal with a great number of variables simultaneously and observe inherent disease-related patterns in the data [
Machine learning for health care is a key discipline aimed to translate large health data sets into operative knowledge in different medical fields [
There are several research reports using COVID-19 data sets, which focus on predicting the patients’ mortality or severity by mainly using regression modeling from labeled clinical records [
Health agencies recommend that clinical decisions should be made based on an individual’s biological age rather than chronological age [
Frailty and multi-morbidities, as measures of biological aging, have been found to be risk factors for mortality independent of chronological age in patients with COVID-19 [
Furthermore, reports about case-fatality rates for COVID-19 categorized by age groups could sentence elderly people not only to social exclusion but also to health care indifference. Considering the elderly population as a highly vulnerable group is a simple and negative stereotype that may even influence decision making in clinical resource management [
The prevalence and severity of COVID-19 also varies based on sex, whereby men experience higher mortality than women [
Since chronological age as well as sex cannot be considered as pivotal aspects to determine an individual’s health status and resilience [
Demographic variables (ie, age and sex) were not used in the previously published studies for building models on effective treatments based upon sex or age groups or for understanding sex or age differences [
The objective of this observational longitudinal study was to identify nonoverlapping severity subgroups (ie, clusters) among patients with COVID-19, using exclusive laboratory tests and clinical data obtained during the first medical contact in the emergency department, by means of unsupervised machine learning techniques. Age and sex were not taken into account to build the subgroups due to the ethical implications. For this purpose, we used the data set collected by the HM group of hospitals (HM Hospitales) in Madrid, Spain [
This study is a longitudinal analysis of the data set collected by the HM group of hospitals in Madrid, Spain, in the context of the project Covid Data Save Lives [
We collected the information for each patient identifier and compiled it into one record. This included age, sex, vital signs in the emergency department, and the need or lack of need of the ICU. COVID-19 symptoms, ICD-10 codes of previous and current conditions, as well as different laboratory tests performed in the emergency department were also recorded. We also calculated, for each patient, the duration in days of the hospital stay, including ICU admission and the days from hospitalization to ICU admission. We also considered the first laboratory tests obtained from the emergency department and grouped all of the ICD-10 codes under the first three characters (ie, first letter and two subsequent numbers) of the code to reduce the number of variables and provide generalization. We codified each ICD-10 feature for inclusion in one of the following groups:
Only patients with a discharge reason of
The final sample of 853 patients was similar to the excluded sample (n=1457) in terms of age (mean 67.2, SD 15.7 years, vs mean 67.1, SD 17.0 years;
Laboratory tests used to characterize the patients.
Code | Description | Unit |
RDW | Red cell distribution width | % |
BAS | Basophils | ×103/µL |
BAS% | Percentage of basophils | % |
MCHC | Mean corpuscular hemoglobin concentration | g/dL |
CREA | Creatinine | mg/dL |
EOS | Eosinophils | ×103/µL |
EOS% | Percentage of eosinophils | % |
GLU | Glucose | mg/dL |
AST | Aspartate transaminase | U/L |
ALT | Alanine transaminase | U/L |
MCH | Mean corpuscular hemoglobin | pg |
HCT | Hematocrit | % |
RBC | Red blood cells | ×106/µL |
HB | Hemoglobin | g/dL |
K | Potassium | mmol/L |
LDH | Lactate dehydrogenase | U/L |
LEUC | Leucocytes | ×103/µL |
LYM | Lymphocytes | ×103/µL |
LYM% | Percentage of lymphocytes | % |
MONO | Monocytes | ×103/µL |
MONO% | Percentage of monocytes | % |
NA | Sodium | mmol/L |
NEU | Neutrophils | ×103/µL |
NEU% | Percentage of neutrophils | % |
CRP | C-reactive protein | mg/L |
PLAT | Platelet count | ×103/µL |
BUN | Blood urea nitrogen | mg/dL |
MCV | Mean cell volume | fL |
MPV | Mean platelet volume | fL |
Unsupervised automatic x-means clustering [
Patients were considered here as vectors with several dimensions equal to the number of variables. In this case, the number of variables taken to apply the clustering algorithm was 842. None of the eight variables about demographics, hospital stay, and outcome measures were included. They were removed from the clustering formation because of the potential ethical controversies and biases (ie, demographics) or prospective information (ie, hospitalization stay and outcome measures). The algorithm was applied using several similarity or distance metrics between patients [
To assess the fitness of the cluster distributions from the algorithm executions with the above metrics, the Davies Bouldin index was calculated for each one of them [
From the 1457 patients excluded due to missing values (ie, not used to obtain the clusters), we performed a validation analysis with the patients who presented no missing values in the variables that statistically differed between the three clusters obtained. Subsequently, these patients were assigned to one of the clusters previously obtained by using the best distance metric determined in the clustering process described above.
The difference in the 850 variables between all the clusters obtained was tested using a one-way multivariate analysis of variance. Pairwise post hoc comparisons between clusters were analyzed by the Bonferroni test. Significance was accepted at the 5% level (α=.05). The observed power and effect size, as partial η2, were reported for statistically significant differences.
Number of clusters and the corresponding David Bouldin index.
Similarity measure | David Bouldin index | Number of clusters |
Euclidean distance | 0.948 | 3 |
Canberra distance | N/Aa | 1 |
Chebyshev distance | 0.966 | 3 |
Correlation similarity | 1.400 | 3 |
Cosine similarity | 1.629 | 3 |
Dice similarity | N/A | 1 |
Inner product similarity | N/A | 1 |
Jaccard similarity | 1.387 | 3 |
Kernel Euclidean distance | 1.440 | 3 |
Manhattan distance | 0.701 | 3 |
Max product similarity | N/A | 1 |
Overlap similarity | 5.099 | 4 |
Generalized divergence | 3.445 | 3 |
Itakura-Saito distance | 5.919 | 4 |
Kullback-Leibler divergence | 5.677 | 4 |
Logarithmic loss | 4.595 | 4 |
Logistic loss | 3.445 | 3 |
Mahalanobis distance | 4.595 | 4 |
Squared Euclidean distance | 3.445 | 3 |
Squared loss | 3.659 | 3 |
aN/A: not applicable; the David Bouldin index could not be calculated for these measures because they only had one cluster each.
Demographic and clinical characteristics of the patients in the three clusters are shown in
Regarding laboratory tests, patients in cluster 1 showed significantly higher levels of serum creatinine, potassium, and blood urea nitrogen than those in clusters 2 and 3; cluster 1 patients also had a significantly higher value of red cell distribution width than did cluster 2 patients. In addition, patients in cluster 2 presented with significantly higher values of lymphocytes and serum levels of sodium, and significantly lower platelet counts than patients in cluster 3. In addition, cluster 3 patients showed lower values of mean corpuscular hemoglobin concentration and leucocytes, serum levels of alanine transaminase (ALT), and percentage of neutrophils than did patients in clusters 1 and 2. Cluster 3 patients had significantly higher values and percentages of eosinophils and percentages of lymphocytes than did patients in clusters 1 and 2. Finally, the laboratory tests that showed significant differences between all clusters were found for the serum levels of aspartate transaminase (AST) (cluster 1 > cluster 2 > cluster 3), lactate dehydrogenase (LDH) (cluster 1 > cluster 2 > cluster 3), C-reactive protein (CRP) (cluster 1 > cluster 2 > cluster 3), and the number of neutrophils (cluster 1 > cluster 2 > cluster 3).
Demographic and clinical characteristics of patients (N=853) in the three clusters.
Characteristics | Cluster 1 (n=58) | Cluster 2 (n=300) | Cluster 3 (n=495) | η2a | 1–βb | |||||||||||
|
|
|
|
|||||||||||||
|
Age (years)c, mean (SD) | 71.1 (13.7)d | 67.0 (15.1)d,e | 65.1 (16.2)e | 3.457 | .03 | 0.009 | 0.648 | ||||||||
|
Sex (men)c, n (%) | 41 (70.7)d | 181 (60.3)d | 313 (63.3)d | 1.027 | .36 | 0.003 | 0.23 | ||||||||
|
|
|
|
|||||||||||||
|
Inpatient hospital daysc, mean (SD) | 8.5 (4.9)d | 8.6 (6.4)d | 8.3 (5.1)d | 0.363 | .70 | 0.001 | 0.109 | ||||||||
|
|
26.054 | <.001 | 0.062 | 1 | |||||||||||
|
|
Recovered | 31 (53.4)d | 246 (82.0)e | 443 (89.5)e |
|
|
|
|
|||||||
|
|
Deceased | 27 (46.6)d | 54 (18.0)e | 52 (10.5)e |
|
|
|
|
|||||||
|
|
1.12 | .33 | 0.003 | 0.248 | |||||||||||
|
|
No | 52 (89.7)d | 277 (92.3)d | 458 (92.5)d |
|
|
|
|
|||||||
|
|
Yes | 6 (10.3)d | 23 (7.7)d | 37 (7.5)d |
|
|
|
|
|||||||
|
Days until intensive care unit admissionc, mean (SD) | 0.2 (0.4)d | 3.4 (6.3)d | 2.3 (4.3)d | 1.393 | .26 | 0.042 | 0.289 | ||||||||
|
Days in intensive care unitc, mean (SD) | 0.2 (0.4)d | 4.8 (6.5)d,e | 7.6 (6.9)e | 3.747 | .03 | 0.106 | 0.665 | ||||||||
|
Mechanical ventilation needc, n (%) | 35 (60.3)d | 177 (59.0)d | 277 (56.0)d | 0.163 | .85 | <0.001 | 0.075 | ||||||||
|
|
|
|
|||||||||||||
|
First heart ratio measurement in the emergency department | 98.4 (25.0)d,e | 100.1 (26.2)d | 93.5 (24.4)e | 8.45 | <.001 | 0.021 | 0.965 | ||||||||
|
First oxygen saturation measurement in the emergency department | 84.2 (12.3)d | 90.1 (7.6)e | 94.2 (3.6)f | 81.732 | <.001 | 0.171 | 1 | ||||||||
|
Last heart ratio measurement in the emergency department | 99.0 (25.1)d,e | 100.1 (26.0)d | 93.6 (24.7)e | 8.104 | <.001 | 0.02 | 0.958 | ||||||||
|
Last oxygen saturation measurement in the emergency department | 84.2 (12.2)d | 90.0 (7.52)e | 94.2 (3.6)f | 82.554 | <.001 | 0.172 | 1 | ||||||||
|
Red cell distribution width (%) | 13.6 (1.9)d | 12.9 (1.84)e | 13.0 (1.9)d,e | 3.28 | .04 | 0.008 | 0.623 | ||||||||
|
Basophils (×103/µL) | 0.03 (0.03)d | 0.02 (0.02)d,e | 0.02 (0.0)e | 5.545 | .004 | 0.014 | 0.854 | ||||||||
|
Mean corpuscular hemoglobin concentration (g/dL) | 33.9 (1.5)d | 34.0 (1.17)d | 33.6 (1.2)e | 8.602 | <.001 | 0.021 | 0.968 | ||||||||
|
Creatinine (mg/dL) | 1.3 (1.4)d | 1.0 (0.47)e | 1.0 (0.5)e | 9.591 | <.001 | 0.024 | 0.981 | ||||||||
|
Eosinophils (×103/µL) | 0.02 (0.04)d | 0.02 (0.04)d | 0.04 (0.1)e | 6.518 | .002 | 0.016 | 0.908 | ||||||||
|
Eosinophils (%) | 0.20 (0.5)d | 0.3 (0.60)d | 0.6 (1.2)e | 10.000 | <.001 | 0.025 | 0.985 | ||||||||
|
Aspartate transaminase |
80.3 (48.0)d | 55.8 (33.4)e | 32.8 (18.7)f | 109.193 | <.001 | 0.216 | 1 | ||||||||
|
Alanine transaminase |
57.2 (69.1)d | 50.7 (48.1)d | 29.5 (23.8)e | 32.686 | <.001 | 0.076 | 1 | ||||||||
|
Potassium (mmol/L) | 4.6 (0.8)d | 4.2 (0.6)e | 4.2 (0.5)e | 16.957 | <.001 | 0.041 | 1 | ||||||||
|
Lactate dehydrogenase (U/L) | 1339.72 (240.56)d | 742.5 (122.0)e | 447.7 (91.5)f | 1666.635 | <.001 | 0.808 | 1 | ||||||||
|
Leucocytes (×103/µL) | 9.9 (4.8)d | 8.5 (4.2)d | 6.9 (5.2)e | 13.055 | <.001 | 0.032 | 0.997 | ||||||||
|
Lymphocytes (×103/µL) | 1.0 (0.5)d,e | 1.0 (0.6)d | 1.3 (2.1)e | 3.692 | .03 | 0.009 | 0.679 | ||||||||
|
Lymphocytes (%) | 12.6 (7.8)d | 14.0 (7.7)d | 20.0 (9.8)e | 46.962 | <.001 | 0.106 | 1 | ||||||||
|
Monocytes (%) | 5.1 (2.9)d | 6.6 (3.9)d | 8.7 (4.8)e | 29.321 | <.001 | 0.069 | 1 | ||||||||
|
Sodium (mmol/L) | 136.2 (7.1)d,e | 136.2 (4.4)d | 137.2 (4.6)e | 4.016 | .02 | 0.01 | 0.718 | ||||||||
|
Neutrophils (×103/µL) | 8.4 (4.7)d | 6.9 (4.0)e | 4.9 (2.7)f | 45.584 | <.001 | 0.103 | 1 | ||||||||
|
Neutrophils (%) | 81.8 (10.2)d | 78.8 (9.9)d | 70.4 (11.9)e | 62.070 | <.001 | 0.135 | 1 | ||||||||
|
C-reactive protein |
206.1 (131.7)d | 152.1 (110.0)e | 64.2 (63.7)f | 12.930 | <.001 | 0.223 | 1 | ||||||||
|
Platelet count (×103/µL) | 229.0 (92.2)d,e | 236.3 (96.6)d | 210.3 (87.2)e | 7.541 | .001 | 0.019 | 0.944 | ||||||||
|
Blood urea nitrogen (mg/dL) | 58.9 (56.6)d | 41.8 (29.0)e | 40.5 (29.7)e | 7.579 | .001 | 0.019 | 0.945 | ||||||||
|
|
|
|
|||||||||||||
|
Previous history of disorders of purine and pyrimidine metabolism | 4 (6.9)d | 4 (1.3)e | 25 (5.1)d | 4.179 | .02 | 0.01 | 0.736 | ||||||||
|
Previous history of epilepsy and recurrent seizures | 3 (5.2)d | 4 (1.3)e | 2 (0.4)e | 5.660 | .004 | 0.014 | 0.862 | ||||||||
|
Previous history of emphysema | 3 (5.2)d | 2 (0.7)e | 2 (0.4)e | 6.663 | .001 | 0.017 | 0.914 | ||||||||
|
Previous history of thoracic, thoracolumbar, and lumbosacral intervertebral disc disorders | 0 (0)d,e | 9 (3.0)d | 3 (0.6)e | 4.385 |
.01 | 0.011 | 0.758 | ||||||||
|
Previous history of surgical procedures | 0 (0)d,e | 4 (1.3)d | 0 (0)e | 3.753 | .02 | 0.009 | 0.686 | ||||||||
|
Surgical operations during the current hospitalization | 3 (5.2)d | 2 (0.7)e | 4 (0.8)e | 4.880 | .008 | 0.012 | 0.804 |
aEffect size.
bObserved power.
cThese variables were not used for the cluster construction.
d-fValues in the same row, but in different columns, that do not share footnote letters were significantly different after Bonferroni post hoc correction; values in the same row, but in different columns, that share footnote letters were not significantly different.
For a clearer characterization of the clusters,
A web-based cluster assignment tool, based on the results reported here, can be found online [
To test the robustness of the identified clusters, we performed a validation analysis using the initially excluded patients who did not have missing values in the variables that statistically differed among the three clusters (
Hospital stay, outcome measures, and laboratory tests that showed statistically significant differences among clusters with a medium or high effect size (η2>0.06). Note that some variables are scaled (transformation between brackets) for the sake of graph legibility. ALT: alanine transaminase; AST: aspartate transaminase; CRP: C-reactive protein; ICU: intensive care unit; LDH: lactate dehydrogenase.
Demographics as well as hospital stay and prognosis of the patients (n=349) selected for the validation analysis in the three clusters.
Characteristics | Cluster 1 (n=18) | Cluster 2 (n=112) | Cluster 3 (n=219) | η2a | 1–βb | ||||||||||||||
|
|
|
|
||||||||||||||||
|
Age (years), mean (SD) | 72.8 (14.2)c,d | 71.3 (14.3)c | 64.2 (15.8)d | 9.414 (2, 346) | <.001 | 0.052 | 0.979 | |||||||||||
|
Sex (men), n (%) | 14 (77.8)c | 68 (60.7)c | 123 (56.2)c | 1.738 (2, 346) | .18 | 0.01 | 0.364 | |||||||||||
|
|
|
|
||||||||||||||||
|
Inpatient hospital days, mean (SD) | 9.1 (6.4)c | 9.3 (5.9)c | 8.0 (5.3)c | 2.320 (2, 346) | .10 | 0.013 | 0.469 | |||||||||||
|
|
22.025 (2, 346) | <.001 | 0.113 | 1 | ||||||||||||||
|
|
Recovered | 8 (44.4)c | 80 (71.4)d | 200 (91.3)e |
|
|
|
|
||||||||||
|
|
Deceased | 10 (55.6)c | 32 (28.6)d | 19 (8.7)e |
|
|
|
|
||||||||||
|
|
4.268 (2, 346) | .02 | 0.024 | 0.743 | ||||||||||||||
|
|
No | 16 (88.9)c,d | 101 (90.2)c | 213 (97.3)d |
|
|
|
|
||||||||||
|
|
Yes | 2 (11.1)c,d | 11 (9.8)c | 6 (2.7)d |
|
|
|
|
||||||||||
|
Days until intensive care unit admission, mean (SD) | 6.5 (7.8)c | 4.1 (3.9)c | 6.3 (13.7)c | 0.170 (2,16) | .84 | 0.021 | 0.072 | |||||||||||
|
Days in intensive care unit, mean (SD) | 4.5 (0.7)c | 3.8 (4.5)c | 3.2 (4.6)c | 0.082 (2,16) | .92 | 0.01 | 0.06 | |||||||||||
|
Mechanical ventilation need, n (%) | 12 (66.7)c | 54 (48.2)c | 96 (43.8)c | 1.854 (2, 346) | .16 | 0.011 | 0.385 |
aEffect size.
bObserved power.
c-eValues in the same row, but in different columns, that do not share footnote letters were significantly different after Bonferroni post hoc correction; values in the same row, but in different columns, that share footnote letters were not significantly different.
With the application of an unsupervised machine learning approach, we could identify and segregate patients with COVID-19 into subgroups depending on the severity of disease, simply by using standard laboratory tests performed during the first medical assessment in the emergency department. We found that inflammatory (ie, CRP), hematologic (ie, number of neutrophils and percentage of monocytes and lymphocytes), and serum biochemical abnormalities (ie, AST, ALT, and LDH), mainly indicating liver dysfunction, detected upon admission to the hospital could predict the severity of the disease. From a sum of 850 variables collected in the emergency department, only four standard laboratory tests (ie, serum levels of AST, LDH, CRP, and the number of neutrophils) were enough to segregate these patients into three separate clusters. Of these, the levels of LDH had the biggest effect size, practically allowing us to differentiate the three clusters linearly. Further, the percentage of monocytes and lymphocytes as well as ALT distinguished cluster 3 patients (ie, less severe) from patients in the other two clusters. Cluster 1 was characterized by the highest proportion of deceased patients; the highest levels of AST, ALT, LDH and CRP; the highest number of neutrophils; and the lowest percentages of monocytes and lymphocytes (
Our results have several clinical implications. First, age and sex were not considered while building the clusters. Therefore, our unsupervised machine learning approach, based exclusively on the performance of simple laboratory tests at a primordial stage, would permit the establishment of a strategy for rationing of health care resources and to establish a triage protocol, which would support medical decisions in a transparent and ethical way. Second, since the analyzed data are from standard laboratory tests, this method would be especially valuable for underdeveloped and developing regions that lack medical resources and have affordability issues. Finally, we could tailor treatment to each severity group accordingly at a primordial stage (ie, in the emergency department). For example, more aggressive therapies could be considered in patients classified in cluster 1 (ie, the most severe) and not in those in cluster 3 (ie, the least severe).
Initially, SARS-CoV-2 was primarily considered a respiratory pathogen. However, with time, it has behaved like a virus with the potential to cause multisystem involvement [
Although one previous multicenter study, based on the analyses of demographics, comorbidities, vital signs, and laboratory test results upon admission, that evaluated the prediction of disease course in patients with COVID-19 has been undertaken [
The study should be interpreted within the context of several limitations. First, the patients in this study may represent a selected group of patients with COVID-19 (ie, patients with a more severe disease, since all of them were admitted to the hospital); hence, it is questionable as to what extent our results could be generalized to the entire population of patients with COVID-19. The reason for this was that the extreme circumstances in our hospitals at the peak of this pandemic permitted the hospitalization of only the most severe cases. Notwithstanding, our aim was to detect severity subgroups among patients with COVID-19 upon admission to the hospital. Second, we only kept the records (ie, patients), laboratory tests, and clinical variables from 853 patients from the data set due to the high number of missing values in the remaining 1457 patients. Despite this, the results have been robust.
In closing, to the authors’ knowledge, the work presented in this paper is the first attempt to use unsupervised machine learning to identify severity subgroups among patients with COVID-19 upon admission. A few affordable, simple, and standard laboratory tests, which are expected to be available in any emergency department, have shown promising discriminative power for characterization of severity subgroups among patients with COVID-19. We have also provided an online severity cluster assignment tool for patients with COVID-19 who are admitted to the emergency department [
alanine transaminase
aspartate transaminase
Bayesian information criterion
C-reactive protein
International Statistical Classification of Diseases and Related Health Problems, 10th Revision
intensive care unit
lactate dehydrogenase
JBL collaborated in the conception, organization, and execution of the research project; the writing of the first draft of the manuscript; and the review and critique of the manuscript. MDDC collaborated in the conception and organization of the research project, the statistical analyses, writing of the first draft of the manuscript, and the review and critique of the manuscript. AE collaborated in the organization of the research project and the review and critique of the manuscript. RG collaborated in the organization of the research project and the review and critique of the manuscript. SD collaborated in the organization of the research project and the review and critique of the manuscript. JIS collaborated in the conception and organization of the research project, the statistical analyses, the writing of the first draft of the manuscript, and the review and critique of the manuscript.
None declared.