Background: Within the trauma system, the emergency department (ED) is the hospital’s first contact and is vital for allocating medical resources. However, there is generally limited information about patients that die in the ED.
Objective: The aim of this study was to develop an artificial intelligence (AI) model to predict trauma mortality and analyze pertinent mortality factors for all patients visiting the ED.
Methods: We used the Korean National Emergency Department Information System (NEDIS) data set (N=6,536,306), incorporating over 400 hospitals between 2016 and 2019. We included the International Classification of Disease 10th Revision (ICD-10) codes and chose the following input features to predict ED patient mortality: age, sex, intentionality, injury, emergent symptom, Alert/Verbal/Painful/Unresponsive (AVPU) scale, Korean Triage and Acuity Scale (KTAS), and vital signs. We compared three different feature set performances for AI input: all features (n=921), ICD-10 features (n=878), and features excluding ICD-10 codes (n=43). We devised various machine learning models with an ensemble approach via 5-fold cross-validation and compared the performance of each model with that of traditional prediction models. Lastly, we investigated explainable AI feature effects and deployed our final AI model on a public website, providing access to our mortality prediction results among patients visiting the ED.
Results: Our proposed AI model with the all-feature set achieved the highest area under the receiver operating characteristic curve (AUROC) of 0.9974 (adaptive boosting [AdaBoost], AdaBoost + light gradient boosting machine [LightGBM]: Ensemble), outperforming other state-of-the-art machine learning and traditional prediction models, including extreme gradient boosting (AUROC=0.9972), LightGBM (AUROC=0.9973), ICD-based injury severity scores (AUC=0.9328 for the inclusive model and AUROC=0.9567 for the exclusive model), and KTAS (AUROC=0.9405). In addition, our proposed AI model outperformed a cutting-edge AI model designed for in-hospital mortality prediction (AUROC=0.7675) for all ED visitors. From the AI model, we also discovered that age and unresponsiveness (coma) were the top two mortality predictors among patients visiting the ED, followed by oxygen saturation, multiple rib fractures (ICD-10 code S224), painful response (stupor, semicoma), and lumbar vertebra fracture (ICD-10 code S320).
Conclusions: Our proposed AI model exhibits remarkable accuracy in predicting ED mortality. Including the necessity for external validation, a large nationwide data set would provide a more accurate model and minimize overfitting. We anticipate that our AI-based risk calculator tool will substantially aid health care providers, particularly regarding triage and early diagnosis for trauma patients.
Trauma is the foremost cause of mortality worldwide, especially for those aged under 45 years . Despite recent advances, trauma-related mortality remains a substantial challenge. In the trauma system, the emergency department (ED) is the hospital’s first contact and is vital for allocating medical resources [ ]. Decision-making in the ED is crucial and determines further treatment or diagnosis [ , ]; thus, decisions should be made promptly and accurately to reduce the “golden hours” for treatment of a patient with severe trauma. Predicting ED mortality is critical for improving the trauma system and reducing individual, medical staff, and societal burdens. Injury severity, patient demographics, prehospital care, ED trauma care quality, and other complex and multifactorial elements that affect ED mortality pose challenges at gaining an in-depth understanding of the key factors contributing to ED mortality as a whole. Furthermore, ED mortality encompasses considerably more severe injuries than in-hospital mortality, and most patients that die in the ED may not receive additional diagnostic workups or treatments such as computed tomography (CT), angiography, or surgery. Due to hemodynamic instability, a complete workup is nearly impossible for unstable ED trauma patients. Therefore, ED mortality prediction depends on limited information, which could be facilitated by recent advancements in artificial intelligence (AI) technology.
Our research team previously developed two AI models for predicting in-hospital trauma patient mortality [, ]; one utilizes Abbreviated Injury Scale (AIS) codes and the other incorporates the International Classification of Diseases, 10th revision (ICD-10) codes and other variables from the Korean National Emergency Department Information System (NEDIS) data set. Deceased ED patients were excluded from these models, as we assumed they may have received an insufficient workup or indicate an inaccurate diagnosis in previous studies. These AI models exhibited high accuracy for predicting in-hospital mortality but did not effectively learn from deceased ED patient data.
We deemed that ED patients who died before admission sustained severe injuries and that their information was insufficient compared to that available for patients who died after intensive care unit (ICU) or ward admission. We concluded that ED mortality should be predicted using a different patient data set than in-hospital mortality, and that the new model should also have alternative weights and input variables. Therefore, in this study, we developed AI models for predicting ED mortality in trauma patients using the NEDIS data set that was not incorporated into the previous AI models [, ].
Patients and Data Set
The Korean National Emergency Medical Center has gathered NEDIS data from over 400 hospitals in Korea since 2016. This study employed the 2016 to 2019 NEDIS data set (data acquisition number N20212920825) to develop an AI model for predicting trauma mortality among all patients visiting the ED according to the Transparent Reporting of a Multivariable Model for Individual Prognosis or Diagnosis (TRIPOD) statement . Data used for the AI model included those related to patients experiencing physical trauma (but not psychological) with an “S” or “T” diagnostic code from the ICD-10 (N=7,664,443); the S code represents trauma in a single body region, whereas T signifies trauma in multiple or unspecified regions. All ED patient data with an S or T diagnosis code were included, regardless of ward or ICU admittance or discharge.
Patient data with the following conditions were excluded: (1) patients who died before or upon hospital arrival (n=9506, 0.12%), who were regarded as dead-on-arrival and did not undergo any treatment or cardiopulmonary resuscitation; (2) ED patients who died due to severe conditions without further treatment, such as cardiac arrest without receiving cardiopulmonary resuscitation, despite being brought to the hospital alive (n=2800, 0.04%); (3) nontrauma patients (n=156, 0.002%); (4) patients transferred to another hospital from the ED (n=124,180, 1.62%); (5) patients with end-stage disease such as cancer who wished to receive care at home (n=2050, 0.03%); (6) patients who had to leave the hospital because there was no hope of recovery (n=270, 0.003%); (7) voluntarily discharged patients (n=162,499, 2.12%); (8) patients for whom it was difficult to record data due to being unidentifiably detained for criminal activity (n=29,242, 0.38%); (9) patients without S or T diagnostic codes, since all physical trauma receives an S or T diagnostic code; and (10) patients with frostbite (ICD-10 T33-T35.6), intoxication (ICD-10 T36-T65), or unspecified injury or complication diagnostic codes (ICD-10 T66-T78, T80-T88) (n=797,434, 10.4%). Our AI model was trained and tested using data from 6,536,306 patients.presents the patient selection process.
AI Model Variables
This study followed Developing and Reporting Machine Learning Predictive Models in Biomedical Research guidelines . We used the following 14 NEDIS data variables for AI model input: age; gender; intentionality; injury mechanism; emergent symptom; Alert, Verbal, Painful, Unresponsive (AVPU) scale; initial Korean Triage and Acuity Scale (KTAS); systolic blood pressure; diastolic blood pressure; pulse rate per minute; respiratory rate per minute; body temperature; oxygen saturation; and ICD-10 codes. Moreover, intentionality includes six categories: accidental/unintentional, self-harm/suicide, violence/assault, other specified, unspecified, and missing data. The injury mechanism comprises 16 categories: car accident, bike accident, motorcycle accident, other traffic accidents, unspecified traffic accident, fall, slipped, struck, firearm/cut/pierce, machine, fire/flames/heat, drowning, poisoning, choking/hanging, others, and unknown. Emergency and nonemergency are the two emergent symptom categories. The AVPU scale is a simplified version of the Glasgow Coma Scale (GCS) [ , ] and includes four categories: A, alert; V, verbal responsive (drowsy); P, painful response (stupor, semicoma); and U, unresponsive (coma).
The KTAS is a standardized triage tool that avoids complexity and ambiguity by employing five categories: Level 1, resuscitation; Level 2, emergent; Level 3, urgent; Level 4, less urgent; and Level 5, nonurgent. The KTAS was initially developed as an ED severity triage in 2012, based on the Canadian Triage and Acuity Scale . According to NEDIS policy, a certified faculty member must conduct the initial KTAS within 2 minutes of ED admission. ICD-10 codes starting with S or T entail 865 categories. All AI model variables are summarized in Table S1 of . The mortality of patients visiting the ED was defined as a patient with a dead result code or an indication of discharge with medical futility in NEDIS.
Data Split and Cross-Validation
Training and testing data used in this study are detailed in Table S2 of. Data from 6,536,306 patients were divided into training (n=5,229,008) and testing (n=1,307,252) data sets with an 8:2 ratio in a stratified fashion. Only the testing set was used to test our developed AI model. We first performed 5-fold cross-validation in the training data set to assess how the prediction results generalize to an independent data set. The training data set was randomly shuffled and stratified into five equal groups, of which four were selected for training and the remaining group was used for internal validation. This process was repeated five times by shifting the internal validation group.
Handling Data Imbalance
The data were severely imbalanced as there were 6351 (0.1%) deceased patients recorded (Table S2 in). To minimize the model bias toward a majority (ie, the survived patient group), we used the synthetic minority oversampling technique (SMOTE) [ ] to upsample the deceased patient quantity and match the survived patient group. Next, we identified each group’s optimized weight values for the loss function during model training. We then iteratively learned demographic parity-based coefficients for the weight value search [ ], providing a closed-form expression for the data weights. These two methods prevented bias toward survived patient data.
Machine Learning Models
We used three feature sets to develop our AI model: the first set included all 921 NEDIS variables and 878 features from ICD-10 codes, the second set only used the 878 features from ICD-10 codes, and the third set utilized all features except those from ICD-10 codes. We applied eight machine learning models from each feature set: adaptive boosting (AdaBoost) , extreme gradient boosting (XGBoost) [ ], light gradient boosting machine (LightGBM) [ ], gradient boosting machine (GBM) [ ], extremely random trees (ERT) [ ], logistic regression (LR) [ ], random forest (RF) [ ], and deep neural network (DNN). We chose the best three among the eight models and applied an ensemble approach by considering all possible combinations. Finally, we evaluated the feature importance, listing features in the order they contributed to mortality prediction.
In decision tree approaches such as AdaBoost, XGBoost, and LightGBM, calculating feature importance values hinges on assessing the decrease in node impurity while factoring in the probability of reaching each node. Node impurity is determined using a well-established metric called the Gini index, which quantifies the impurity degree at a given node by measuring the extent to which a specific variable would be incorrectly classified if selected randomly. The impurity is evaluated by considering the weighted sum of each class’s squared probabilities within the node. This comprehensive methodology incorporates node probabilities, impurity measures, and feature importance calculations, allowing for a nuanced understanding of how different variables impact prediction outcomes and can offer valuable insights for decision-making and analysis.
Performance evaluations were based on 5-fold cross-validation using the following metrics: sensitivity, specificity, accuracy, balanced accuracy, and area under the receiver operating characteristic (ROC) curve (AUROC). Due to the significant data imbalance, we used balanced accuracy as the primary model evaluation metric.
The models used Python (version 3.7.13), NumPy (version 1.21.6), Pandas (version 1.3.5), Matplotlib (version 3.5.1), and Scikit-learn (version 1.0.2). All statistical analyses were performed with R software version 4.1.2 (R Foundation for Statistical Computing). Continuous variables are presented using the mean and standard deviation, while categorical data are presented using proportions. Statistical continuous data comparisons were performed using the Student t-test or the Mann-Whitney U test as appropriate. Similarly, proportions were compared through χ2 or Fisher exact tests as appropriate. A two-sided P value <.05 was considered statistically significant.
To further evaluate the performance of our AI model, we implemented ICD-10–based conventional metrics for comparison: inclusive survival risk ratio (SRR), exclusive SRR, and KTAS. The ICD-based Injury Severity Score (ICISS) utilizes the SRR to calculate survival probability . SRRs can be quantified as the number of survived patients with a specific injury code divided by all patients with the same code. Patient survival probability was determined by multiplying all patient injury code SRRs [ ]. The traditional ICISS was calculated as the survival probability product for up to 10 injuries [ ]. There are two different SRR calculation approaches: inclusive and exclusive SRR. Inclusive SRR can be calculated for each injury without considering the associated injury. In contrast, exclusive SRR divides the number of survivors with an isolated specific injury by the total number of patients with only that injury. Thus, patients with multiple injuries were excluded from exclusive SRR calculations [ ]. We used the survival probability determined from our previous study [ ] because other studies [ , ] did not use ED mortality.
The Institutional Review Board of Wonkwang University Hospital approved this study (WKUH 2019-11-004-001). The requirement for informed consent was waived due to the study’s observational nature and the deidentification of each patient.
shows a comparison of the main variables between the deceased and survived patients among all patients visiting the ED, whereas Table S3 of compares ICD-10 codes between deceased and survived patients (878 features).
|Variables||Deceased (n=6351)||Survived (n=6,529,909)||P value|
|Age (years), n (%)||<.001|
|<1||7 (0.1)||91,585 (1.4)|
|1-4||36 (0.6)||711,018 (10.9)|
|5-9||41 (0.6)||425,503 (6.5)|
|10-14||53 (0.8)||289,072 (4.4)|
|15-19||190 (3.0)||329,548 (5.0)|
|20-24||271 (4.3)||412,019 (6.3)|
|25-29||272 (4.3)||433,147 (6.6)|
|30-34||275 (4.3)||386,563 (5.9)|
|35-39||277 (4.3)||414,559 (6.3)|
|40-44||323 (5.1)||386,061 (5.9)|
|45-49||488 (7.7)||442,058 (6.8)|
|50-54||573 (9.0)||456,820 (7.0)|
|55-59||692 (10.9)||480,396 (7.4)|
|60-64||576 (9.1)||362,980 (5.6)|
|65-69||505 (8.0)||244,581 (3.7)|
|70-74||488 (7.7)||199,392 (3.1)|
|75-79||587 (9.2)||196,499 (3.0)|
|80-84||420 (6.6)||149,770 (2.3)|
|85-89||196 (3.1)||81,181 (1.2)|
|90-94||52 (0.8)||29,114 (0.4)|
|95-99||16 (0.3)||7025 (0.1)|
|100-104||12 (0.2)||840 (0)|
|105-109||0 (0)||111 (0)|
|110-114||0 (0)||42 (0)|
|115-119||1 (0)||24 (0)|
|≥120||0 (0)||1 (0)|
|Initial KTASa, n (%)|
|Level 1||5214 (82.1)||10,941 (0.2)||<.001|
|Level 2||795 (12.5)||134,003 (2.1)||<.001|
|Level 3||219 (3.4)||1,044,687 (16.0)||<.001|
|Level 4||49 (0.8)||4,211,165 (64.5)||<.001|
|Level 5||1 (0)||858,996 (13.2)||<.001|
|Not classified||0 (0)||227 (0)||.99|
|Missing data||73 (1.1)||269,890 (4.1)||<.001|
|Intentionality, n (%)|
|Accidental, unintentional||4734 (74.5)||5,151,631 (78.9)||<.001|
|Suicide, intentional self-harm||399 (6.3)||28,347 (0.4)||<.001|
|Assault, violence||74 (1.2)||191,857 (2.9)||<.001|
|Other specified||217 (3.4)||21,874 (0.3)||<.001|
|Unspecified||707 (11.1)||82,701 (1.3)||<.001|
|Missing data||220 (3.5)||1,053,499 (16.1)||<.001|
|Injury mechanism, n (%)|
|Car accident||1073 (16.9)||575,270 (8.8)||<.001|
|Bike accident||155 (2.4)||130,385 (2.0)||.01|
|Motorcycle accident||579 (9.1)||128,419 (2.0)||<.001|
|Traffic accident-pedestrian, train, airplane, ship, etc||1521 (23.9)||173,313 (2.7)||<.001|
|Traffic accident-unknown||32 (0.5)||769 (0.0)||<.001|
|Fall||1790 (28.2)||379,735 (5.8)||<.001|
|Slipped||80 (1.3)||1,087,804 (16.7)||<.001|
|Struck by person or object||233 (3.7)||1,173,585 (18.0)||<.001|
|Firearm, cut, or pierced||149 (2.3)||787,042 (12.1)||<.001|
|Machine||46 (0.7)||62,910 (1.0)||<.001|
|Fire, flames, or heat||29 (0.5)||175,797 (2.7)||<.001|
|Drowning or nearly drowning||16 (0.3)||546 (0.0)||<.001|
|Poisoning||6 (0.1)||7,639 (0.1)||.73|
|Choking, hanging||105 (1.7)||3,817 (0.1)||<.001|
|Others-rape, electric||62 (1.0)||684,468 (10.5)||<.001|
|Unknown||255 (4.0)||104,911 (1.6)||<.001|
|Missing data||220 (3.5)||1,053,499 (16.1)||<.001|
|Emergent symptom, n (%)|
|Yes||6262 (98.6)||5,245,303 (80.3)||<.001|
|No||89 (1.4)||1,284,606 (19.7)||<.001|
|Unspecified||0 (0.0)||0 (0.0)|
|AVPUb scale, n (%)|
|Alert||364 (5.7)||5,340,780 (81.8)||<.001|
|Verbal response (drowsy)||238 (3.7)||41,488 (0.6)||<.001|
|Painful response (stupor, semicoma)||532 (8.3)||20,004 (0.3)||<.001|
|Unresponsive (coma)||4973 (78.3)||5,695 (0.1)||<.001|
|Unspecified response||244 (3.8)||1,121,942 (17.2)||<.001|
|Male sex, n (%)||4446 (70.0)||3,839,715 (58.8)||<.001|
|Systolic blood pressure, mean (SD)||125.98 (22.76)||132.39 (18.24)||<.001|
|Diastolic blood pressure, mean (SD)||76.71 (13.57)||79.83 (11.29)||<.001|
|Pulse rate per minute, mean (SD)||89.71 (15.50)||87.59 (15.92)||<.001|
|Respiratory rate per minute, mean (SD)||20.07 (3.00)||19.94 (2.84)||.001|
|Body temperature, mean (SD)||36.15 (0.93)||36.62 (0.40)||<.001|
|Oxygen saturation, mean (SD)||94.86 (9.72)||98.08 (1.56)||<.001|
aKTAS: Korean Triage and Acuity Scale.
summarizes the 5-fold cross-validation results. The AdaBoost model with all 921 features achieved the highest balanced accuracy (0.9801) and AUROC (0.9973) values among the seven models: XGBoost, LightGBM, AdaBoost with XGBoost, AdaBoost with Light GBM, XGBoost with LightGBM, and the three models combined. In addition, the 921-feature model provided higher accuracy metrics than the 878-feature model with the ICD-10 codes and the 43-feature model excluding the ICD-10 codes. Interestingly, the model with all features except for the ICD-10 codes reached higher accuracy metrics than the 878-feature model with the ICD-10 codes.
These results substantiate that patient information, symptoms, and trauma causes predict mortality better than ICD-10 codes. We also compared the performance of traditional methods. Inclusive SRR, exclusive SRR, and KTAS exhibited lower balanced accuracies (0.9069, 0.9175, and 0.9619, respectively) and AUROCs (0.9345, 0.9554, and 0.9372, respectively).
|Model||Sensitivity, mean (SD)||Specificity, mean (SD)||Accuracy, mean (SD)||Balanced accuracy, mean (SD)||AUROCa, mean (SD)|
|921 features (including ICD-10b)|
|AdaBoostc||0.9713 (0.0060)||0.9890 (0.0040)||0.9801 (0.0035)||0.9801 (0.0035)||0.9973 (0.0005)|
|XGBoostd||0.9674 (0.0035)||0.9897 (0.0034)||0.9786 (0.0016)||0.9786 (0.0016)||0.9968 (0.0004)|
|LightGBMe||0.9678 (0.0034)||0.9898 (0.0034)||0.9788 (0.0015)||0.9788 (0.0015)||0.9968 (0.0005)|
|GBMf||0.7952 (0.0088)||0.9475 (0.0074)||0.8713 (0.0059)||0.8713 (0.0059)||0.9319 (0.0054)|
|ERTg||0.8944 (0.0106)||0.9248 (0.0110)||0.9095 (0.0063)||0.9095 (0.0063)||0.9542 (0.0051)|
|LRh||0.9514 (0.0052)||0.9933 (0.0030)||0.9723 (0.0026)||0.9723 (0.0026)||0.9717 (0.0027)|
|RFi||0.9310 (0.0116)||0.9667 (0.0086)||0.9488 (0.0206)||0.9488 (0.0116)||0.9872 (0.0034)|
|DNNj||0.9708 (0.0058)||0.9847 (0.0048)||0.9778 (0.0038)||0.9778 (0.0038)||0.9944 (0.0012)|
|AdaBoost+XGBoost||0.9675 (0.0036)||0.9899 (0.0034)||0.9787 (0.0016)||0.9787 (0.0016)||0.9970 (0.0005)|
|AdaBoost+LightGBM||0.9681 (0.0034)||0.9900 (0.0033)||0.9790 (0.0014)||0.9790 (0.0014)||0.9970 (0.0005)|
|XGBoost+LigtGBM||0.9675 (0.0036)||0.9899 (0.0034)||0.9787 (0.0016)||0.9787 (0.0016)||0.9968 (0.0004)|
|AdaBoost+XGBoost+LightGBM||0.9675 (0.0036)||0.9899 (0.0034)||0.9787 (0.0016)||0.9787 (0.0016)||0.9970 (0.0005)|
|878 features (ICD-10 only)|
|AdaBoost||0.8261 (0.0073)||0.9429 (0.0070)||0.8845 (0.0053)||0.8845 (0.0053)||0.9448 (0.0056)|
|XGBoost||0.6801 (0.0172)||0.9722 (0.0065)||0.8261 (0.0095)||0.8261 (0.0095)||0.8929 (0.0051)|
|LightGBM||0.6877 (0.0140)||0.9717 (0.0071)||0.8297 (0.0072)||0.8297 (0.0072)||0.8939 (0.0056)|
|GBM||0.7952 (0.0088)||0.9475 (0.0074)||0.8713 (0.0059)||0.8713 (0.0059)||0.9319 (0.0054)|
|ERT||0.8944 (0.0106)||0.9248 (0.0110)||0.9096 (0.0063)||0.9096 (0.0063)||0.9542 (0.0051)|
|LR||0.7535 (0.0110)||0.9540 (0.0060)||0.8537 (0.0055)||0.8537 (0.0054)||0.9401 (0.0066)|
|RF||0.6615 (0.0424)||0.9724 (0.0125)||0.8169 (0.0185)||0.8169 (0.0185)||0.9265 (0.0070)|
|DNN||0.9329 (0.0158)||0.9788 (0.0126)||0.9559 (0.0059)||0.9559 (0.0059)||0.9867 (0.0023)|
|AdaBoost+XGBoost||0.6931 (0.0101)||0.9719 (0.0068)||0.8325 (0.0060)||0.8325 (0.0059)||0.9408 (0.0047)|
|AdaBoost+LightGBM||0.6960 (0.0124)||0.9715 (0.0070)||0.8337 (0.0068)||0.8337 (0.0068)||0.9408 (0.0048)|
|XGBoost+LigtGBM||0.6824 (0.0150)||0.9719 (0.0068)||0.8271 (0.0089)||0.8271 (0.0089)||0.8939 (0.0055)|
|AdaBoost+XGBoost+LightGBM||0.6908 (0.0104)||0.9718 (0.0070)||0.8312 (0.0063)||0.8313 (0.0063)||0.9405 (0.0048)|
|43 features (excluding ICD-10)|
|AdaBoost||0.9707 (0.0050)||0.9854 (0.0062)||0.9781 (0.0020)||0.9781 (0.0020)||0.9965 (0.0007)|
|XGBoost||0.9658 (0.0040)||0.9889 (0.0039)||0.9773 (0.0014)||0.9773 (0.0014)||0.9960 (0.0005)|
|LightGBM||0.9661 (0.0040)||0.9887 (0.0041)||0.9774 (0.0013)||0.9774 (0.0013)||0.9961 (0.0004)|
|GBM||0.9729 (0.0036)||0.9858 (0.0054)||0.9793 (0.0021)||0.9793 (0.0021)||0.9965 (0.0006)|
|ERT||0.9712 (0.0041)||0.9828 (0.0052)||0.9770 (0.0024)||0.9770 (0.0024)||0.9937 (0.0011)|
|LR||0.9448 (0.0053)||0.9921 (0.0029)||0.9685 (0.0023)||0.9685 (0.0023)||0.9941 (0.0009)|
|RF||0.9079 (0.0089)||0.9503 (0.0107)||0.9291 (0.0061)||0.9291 (0.0062)||0.9818 (0.0018)|
|DNN||0.8805 (0.0482)||0.8903 (0.0465)||0.8854 (0.0104)||0.8854 (0.0104)||0.9424 (0.0050)|
|AdaBoost+XGBoost||0.9660 (0.0039)||0.9888 (0.0040)||0.9774 (0.0013)||0.9774 (0.0013)||0.9962 (0.0005)|
|AdaBoost+LightGBM||0.9661 (0.0039)||0.9890 (0.0041)||0.9775 (0.0012)||0.9775 (0.0012)||0.9962 (0.0005)|
|XGBoost+LigtGBM||0.9659 (0.0039)||0.9889 (0.0040)||0.9774 (0.0012)||0.9774 (0.0012)||0.9960 (0.0005)|
|AdaBoost+XGBoost+LightGBM||0.9661 (0.0039)||0.9891 (0.0041)||0.9776 (0.0013)||0.9776 (0.0013)||0.9961 (0.0005)|
aAUROC: area under the receiver operating characteristic curve.
bICD-10: International Classification of Disease 10th Revision.
cAdaBoost: adaptive boosting.
dXGBoost: extreme gradient boosting.
eLightGBM: light gradient boosting machine.
fGBM: gradient boosting machine.
gERT: extremely random trees.
hLR: logistic regression.
iRF: random forest.
jDNN: deep neural network.
kSRR: survival risk ratio.
lKTAS: Korean Triage and Acuity Scale.
Ranked Feature Importance: Explainable AI
Next, we conducted a feature importance analysis to confirm each feature’s contribution.ranks the normalized feature importance from the AdaBoost model, which showed the best performance. Age and systolic blood pressure were the top two mortality predictors among visiting ED patients, followed by unresponsiveness (coma), pulse rate per minute, oxygen saturation, KTAS Level 5, S224 (multiple rib fractures), respiratory rate per minute and painful response (stupor and semicoma). Interestingly, only 49 among the 921 features had nonzero importance values, indicating that the other 872 features did not contribute to mortality prediction. Table S4 in summarizes the total ranked normalized feature importance values.
summarizes the results from the isolated testing data set results (n=1,307,252). The testing data results corroborate that the 921-feature AdaBoost model achieved the highest balanced accuracy (0.9813) and AUROC (0.9974) values among the seven models: XGBoost, LightGBM, AdaBoost with XGBoost, AdaBoost with Light GBM, XGBoost with LightGBM, and the three-model combination. Similar to the cross-validation results, the 921-feature model provided higher accuracy metrics than the 878-feature model with ICD-10 codes and the 43-feature model excluding ICD-10 codes. Furthermore, our selected model also performed better than traditional inclusive SRR, exclusive SRR, and KTAS methods (see Table S5 in ). The similarity between the cross-validation and testing data results denotes minimal overfitting or underfitting. depicts the ROC curve comparison, including comparison of the selected AdaBoost model and the three traditional models (left) and the selected AdaBoost model relative to the 921-, 878-, and 43-feature models (right).
Finally, we compared the performance of the cutting-edge AI model, as it was designed for in-hospital mortality predictions based on the NEDIS data set . The in-hospital mortality model’s balanced accuracy and AUROC values were lower, regardless of the feature set. Notably, the in-hospital mortality model with all 921 features obtained 0.9614 balanced accuracy and 0.9929 AUROC values, and the model with 43 features excluding ICD-10 yielded a 0.9648 balanced accuracy and 0.9923 AUROC. Interestingly, when using the in-hospital mortality model with only 878 features and the ICD-10 codes, the values significantly dropped to a 0.6298 balanced accuracy and 0.7675 AUROC.
|921 features (including ICD-10b)|
|In-hospital mortality AIk ||0.9468||0.9761||0.9760||0.9614||0.9929|
|878 features (ICD-10 only)|
|In-hospital mortality AI ||0.2838||0.9751||0.9751||0.6298||0.7675|
|43 features (excluding ICD-10)|
|In-hospital mortality AI ||0.9441||0.9855||0.9855||0.9648||0.9923|
aAUROC: area under the receiver operating characteristic curve.
bICD-10: International Classification of Disease 10th revision.
cAdaBoost: adaptive boosting.
dXGBoost: extreme gradient boosting.
eLightGBM: light gradient boosting machine.
fGBM: gradient boosting machine.
gERT: extremely random trees.
hLR: logistic regression.
iRF: random forest.
jDNN: deep neural network.
kAI: artificial intelligence.
lSRR: survival risk ratio.
mKTAS: Korean Triage and Acuity Scale.
AI-Driven Public Website Deployment
We launched our final AI model on a public website  to allow access to the mortality prediction results among visiting ED patients. Figure S1A in displays the web interface for entering information. A user inputs age, gender, intentionality, injury mechanism, emergent symptoms, AVPU scale, initial KTAS, systolic blood pressure, diastolic blood pressure, pulse rate per minute, respiratory rate per minute, body temperature, oxygen saturation, and ICD-10 codes. For ICD-10 codes, a user can input multiple codes with commas (eg, S072, S224, T083). After entering information into the web app, the user can obtain the mortality prediction results (see Figure S1B in for an example). The prediction results also include the mortality probability.
Towards Comprehensive Model
Our previous study  revealed that blood pressure, heart rate, body temperature, and other vital signs weakened the previous AI model’s performance, whereas incorporating vital signs strengthened our present AI model. This observation implies that ED and in-hospital mortality patients exhibit differing data distributions. Patients who died after ICU or ward admission and not in the ED were labeled survivors in the present AI model; therefore, the present AI model predicts a more severe mortality type. In future studies, we plan to design a more comprehensive model incorporating both in-hospital and ED mortality. We propose a new pipeline for predicting ED and in-hospital mortality based on our present AI model and the previous two models [ , ] ( ).
This study developed an AI model that accurately predicts mortality among visiting ED patients. Our final AI model achieved substantially high metrics of 0.9738 sensitivity, 0.9888 specificity, 0.9887 accuracy, 0.9813 balanced accuracy, and 0.9974 AUROC. In addition, the proposed AI model outperformed traditional inclusive SRR, exclusive SRR, and KTAS models and the previously developed in-hospital mortality prediction AI model . We identified several significant mortality predictors through the feature importance analysis, including age, AVPU scale, multiple rib fractures, lumbar vertebra fractures, and KTAS Level 2. Furthermore, we devised a risk calculator leveraging our AI model, demonstrating its substantial clinical potential for triage and diagnosis.
We compared AdaBoost, XGBoost, LightGBM, GBM, ERT, LR, RF, and DNN models. After determining that AdaBoost, XGBoost, and LightGBM models were the top three, these models were combined as an ensemble approach. During the evaluation, we considered balanced accuracy as the primary metric due to the data imbalance. Using the best three single models included the boosting algorithm principle, another ensemble approach. Boosting incorporates the stagewise addition method, where multiple weak models are trained and combined into one stronger model. Specifically, the AdaBoost grows decision trees as weak models and adds penalties or weights to the incorrectly predicted samples. After each prediction stage, this process assigns higher weight values to the misclassified samples.
Although XGBoost and LightGBM also utilize boosting algorithms, they differ slightly from AdaBoost regarding the gradient boosting algorithm. The gradient boosting algorithm adjusts the new prediction model using the previous model’s residual errors. Thus, XGBoost computes the residuals and builds decision trees by selecting features, finding optimal splits, estimating leaf node values, and applying regularization. The predictions are updated and the residuals are recalculated at each iteration. LightGBM resembles XGBoost in many aspects, but it has a faster execution rate and maintains high accuracy levels with gradient-based one-side sampling (GOSS) and exclusive feature bundling. In contrast, XGBoost uses a presorted and histogram-based algorithm for computing the best split with GOSS in LightGBM.
We also assessed GBM during the initial single-model evaluation. Both LightGBM and GBM models are founded on gradient boosting frameworks but differ in tree construction, feature discretization, gradient computation, and memory usage. Notably, each machine learning model performed better under different data characteristics. For instance, LR may work well with a linear association between features and a target, whereas a decision tree is more effective for nonlinear relationships. Other models with similar GBM and LightGBM operations can also produce different prediction outcomes due to slightly different tree constructions and data characteristics. Therefore, we evaluated various machine learning models to identify the best-fit solution for a particular problem type: predicting trauma mortality for all patients visiting the ED.
Finally, we enhanced prediction performance by considering all possible combinations based on the three best models. In our final ensemble model, we found the optimal hyperparameters with a maximum depth of 6, 0.01 learning rate, and 400 tree estimators. AdaBoost’s optimal hyperparameters were 400 tree estimators, 0.1 learning rate, a maximum depth of 1, and balanced class weights. Comparatively, LightGBM’s optimal hyperparameters were a maximum depth of 3, 0.01 learning rate, and 400 tree estimators.
Our AI model has several clinical practice advantages. First, our proposed model best predicts severe trauma patients with wounds that may not survive admission or surgery, with most unable to undergo CT or magnetic resonance imaging. Moreover, our feature importance analysis indicates different prominent diagnostic codes than the previous AI model for predicting in-hospital mortality. This finding implies that ED mortality patients have distinct clinical data distributions from those with in-hospital mortality. We also used the present test data set to assess the previous in-hospital mortality AI model . We discovered that the current model was superior, likely due to our current training data set’s ED mortality specificity. Second, we did not use AIS codes requiring an expert or exact diagnoses such as CT or surgery; thus, it is not a time-consuming process. All variables in our AI model are available promptly after a portable X-ray or point-of-care ultrasound. Finally, our AI model’s excellent performance would efficiently allocate medical resources. As our risk calculator tool is accessible through a mobile web app, clinicians can utilize it without time or location limitations.
We used 5,229,008 patient data sets for model training and cross-validation and 1,307,252 data sets for evaluation. To our knowledge, this is the most extensive study on developing an AI model for predicting mortality in trauma patients visiting the ED by incorporating ICD-10 codes and other clinical variables; such a large data set contributes to establishing generalization. In addition, we used SMOTE and optimized class weight search techniques to minimize an imbalanced data distribution. These techniques resolved the overfitting issue by reducing the cross-validation and testing data result difference.
Our team previously developed two AI models for predicting in-hospital mortality from various input features [, ]. In our first study [ ], we rearranged AIS codes relative to 46 anatomical regions, which was considerably more differentiated than the Injury Severity Score (ISS) system’s conventional six regions. We anticipated that the AI model would provide more appropriate weights for each anatomical organ such as the pancreas, rib, or liver. Our second study [ ] used the NEDIS data set without any AIS information. Therefore, we used ICD-10 codes, procedure codes, KTAS, and other clinical variables.
We excluded ED mortality patients in both studies as they may have received insufficient diagnostic workup. Postmortem CT or autopsy is not popular in South Korea. Patients discharged from the ED and returning home may also undergo insufficient diagnosis compared to admitted patients. Additionally, some patients may die after ICU or ward admission. We defined these patients as not indicative of ED mortality. ED mortality patients may have more severe injuries that prohibit admittance. Thus, we postulated that ED mortality patients must be predicted using alternative features.
Two recent systematic reviews compared machine learning models for predicting mortality and decision support [, ]. Zhang et al [ ] discussed six studies using machine learning models based on the national database. However, their training sets ranged from 12,640 to 799,233 patients [ - ]. Our model incorporated data from over 6.5 million patients, establishing this as the largest cohort study. Moreover, our model achieved the highest performance (AUROC of 0.99) compared to previous studies with less accurate performance (AUROC of 0.89 to 0.95) [ - ]. Kim et al [ ] introduced a neural network model using 408,316 patients from the National Trauma Database with a primarily ED mortality outcome. They reported an AUROC of 0.86 for the neural network using age, systolic blood pressure, respiration rate, heart rate, and GCS or simplified consciousness scores [ ] and an AUROC of 0.93 when incorporating ISS. Our present outcome (AUROC of 0.99) dramatically outperforms that of the previous study. Furthermore, the ICD-10 code is more practical than the ISS regarding ED because ISS requires an expert such as a trauma coordinator, whereas any clinician can determine an ICD-10 code. In a validation study including 934,053 patients from the American College of Surgeons Trauma Quality Improvement Program database, Maure et al [ ] reported an AUROC of 0.93 in penetrating injury and 0.88 in blunt injury for predicting mortality. However, they also excluded ED mortality, similar to our previous AI models [ , ].
Limitations and Future Work
Several limitations of this study are acknowledged. First, this is a retrospective study despite the substantial quantity of patient data. Therefore, a further prospective study is needed to avoid potential selection and survival bias. Second, we did not perform external validation. Our study only used training data derived from patients in South Korea; thus, it is unclear whether our model could be adapted to other countries. In future work, we plan to conduct an external validation study using data from another country to develop a global version of the model. Third, our primary outcome was ED mortality, not in-hospital or overall mortality. Since our previous AI model  excluded patients with ED mortality, our two AI models would help clinicians predict various mortality types. Fourth, some data were provided as categorical variables in NEDIS, such as age. However, a sufficiently large data set may enable us to overcome this issue. Fifth, NEDIS did not include AIS or ISS, as we could not locate this information. In future work, we plan to develop an ensemble model incorporating our previously developed AIS code–based AI model [ ].
Our proposed AI model for predicting ED mortality achieved exceptionally high accuracy. This model is derived from a population-based data set in South Korea and provides better insight into trauma care and systems, complementing our previous AI models [, ]. In future studies, we must consider data from various ethnic groups and integrate our previous AI models.
This work was supported by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711138152, RS-2020-KD000095).
The data are not publicly available due to National Emergency Department Information System (NEDIS) policy restrictions and belong to Korea’s National Emergency Medical Center (NEMC). The NEMC provides researchers with a deidentified NEDIS data set for nonprofit academic research. Details of accessing the raw data and guide are available at the NEMC website .
All authors wrote the manuscript and created the figures. Concept and design: WSK, DWK, and JL. Simulation: SSL. Statistical analysis: SHS, JK, STJ, and DKY. Interpretation of data: WSK, DWK, and JL. All authors critically reviewed and agreed to the submission of the final manuscript. SSL, WSK, and DWK contributed equally to this work and should be considered cofirst authors.
Conflicts of Interest
Variables included in the artificial intelligence prediction model (Table S1); summary of data sets (Table S2); ICD-10 codes between deceased and alive patients (Table S3); normalized feature importance of AdaBoost model (Table S4); survival risk ratio of ICD-10 codes (Table S5); screenshots of the web app for mortality prediction in trauma patients (Figure S1).DOCX File , 638 KB
- Park Y, Lee GJ, Lee MA, Choi KK, Gwak J, Hyun SY, et al. Major causes of preventable death in trauma patients. J Trauma Inj 2021 Dec;34(4):225-232 [CrossRef]
- Kim OH, Go SJ, Kwon OS, Park C, Yu B, Chang SW, et al. Part 2. Clinical practice guideline for trauma team composition and trauma cardiopulmonary resuscitation from the Korean Society of Traumatology. J Trauma Inj 2020 Jun;33(2):63-73 [CrossRef]
- Jung PY, Yu B, Park C, Chang SW, Kim OH, Kim M, et al. Clinical practice guideline for the treatment of traumatic shock patients from the Korean Society of Traumatology. J Trauma Inj 2020 Mar;33(1):1-12 [CrossRef]
- Kang WS, Chung H, Ko H, Kim NY, Kim DW, Cho J, et al. Artificial intelligence to predict in-hospital mortality using novel anatomical injury score. Sci Rep 2021 Dec 07;11(1):23534 [CrossRef] [Medline]
- Lee S, Kang WS, Seo S, Kim DW, Ko H, Kim J, et al. Model for predicting in-hospital mortality of physical trauma patients using artificial intelligence techniques: nationwide population-based study in Korea. J Med Internet Res 2022 Dec 13;24(12):e43757 [https://www.jmir.org/2022/12/e43757/] [CrossRef] [Medline]
- Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable Prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015 Jan 06;162(1):W1-73 [https://www.acpjournals.org/doi/abs/10.7326/M14-0698?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed] [CrossRef] [Medline]
- Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res 2016 Dec 16;18(12):e323 [https://www.jmir.org/2016/12/e323/] [CrossRef] [Medline]
- Kelly CA, Upex A, Bateman D. Comparison of consciousness level assessment in the poisoned patient using the alert/verbal/painful/unresponsive scale and the Glasgow Coma Scale. Ann Emerg Med 2004 Aug;44(2):108-113 [CrossRef] [Medline]
- McNarry AF, Goldhill DR. Simple bedside assessment of level of consciousness: comparison of two simple assessment scales with the Glasgow Coma scale. Anaesthesia 2004 Jan;59(1):34-37 [https://onlinelibrary.wiley.com/resolve/openurl?genre=article&sid=nlm:pubmed&issn=0003-2409&date=2004&volume=59&issue=1&spage=34] [CrossRef] [Medline]
- Ryu J, Min M, Lee D, Yeom S, Lee S, Wang I, et al. Changes in relative importance of the 5-level triage system, Korean Triage and Acuity Scale, for the disposition of emergency patients induced by forced reduction in its level number: a multi-center registry-based retrospective cohort study. J Korean Med Sci 2019 Apr 15;34(14):e114 [https://jkms.org/DOIx.php?id=10.3346/jkms.2019.34.e114] [CrossRef] [Medline]
- Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 2002 Jun 01;16:321-357 [CrossRef]
- Jiang H, Nachum O. Identifying and correcting label bias in machine learning. 2020 Presented at: 23rd International Conference on Artificial Intelligence and Statistics (AISTATS); August 26-28, 2020; Palermo, Italy
- Mathanker S, Weckler P, Bowser T, Wang N, Maness N. AdaBoost classifiers for pecan defect classification. Comput Electron Agric 2011 Jun;77(1):60-68 [CrossRef]
- Chen T, Guestrin C. Xgboost: A scalable tree boosting system. 2016 Presented at: KDD '16: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 13-17, 2016; San Francisco, CA [CrossRef]
- Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W. Lightgbm: a highly efficient gradient boosting decision tree. 2017 Presented at: 31st Conference on Neural Information Processing Systems (NIPS 2017); December 4-9, 2017; Long Beach, CA
- Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Statist 2001 Oct 1;29(5):1189-1232 [CrossRef]
- Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn 2006 Mar 2;63(1):3-42 [CrossRef]
- Hosmer Jr DW, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, NJ: John Wiley & Sons; 2013.
- Flaxman AD, Vahdatpour A, Green S, James SL, Murray CJ, Population Health Metrics Research Consortium (PHMRC). Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards. Popul Health Metr 2011 Aug 04;9:29 [https://pophealthmetrics.biomedcentral.com/articles/10.1186/1478-7954-9-29] [CrossRef] [Medline]
- Bergeron E, Simons R, Linton C, Yang F, Tallon JM, Stewart TC, et al. Canadian benchmarks in trauma. J Trauma 2007 Feb;62(2):491-497 [CrossRef] [Medline]
- Kim Y, Jung KY. Utility of the international classification of diseases injury severity score: detecting preventable deaths and comparing the performance of emergency medical centers. J Trauma 2003 Apr;54(4):775-780 [CrossRef] [Medline]
- Final AI model. Kyung Hee University. URL: http://ai-wm.khu.ac.kr/mortality_visiting_ED [accessed 2023-08-09]
- Baur D, Gehlen T, Scherer J, Back DA, Tsitsilonis S, Kabir K, et al. Decision support by machine learning systems for acute management of severely injured patients: a systematic review. Front Surg 2022 Oct 10;9:924810 [https://europepmc.org/abstract/MED/36299574] [CrossRef] [Medline]
- Zhang T, Nikouline A, Lightfoot D, Nolan B. Machine learning in the prediction of trauma outcomes: a systematic review. Ann Emerg Med 2022 Nov;80(5):440-455 [CrossRef] [Medline]
- Hunter A, Kennedy L, Henry J, Ferguson I. Application of neural networks and sensitivity analysis to improved prediction of trauma survival. Comput Methods Programs Biomed 2000 May;62(1):11-19 [https://core.ac.uk/reader/1644813?utm_source=linkout] [CrossRef] [Medline]
- Kim D, You S, So S, Lee J, Yook S, Jang DP, et al. A data-driven artificial intelligence model for remote triage in the prehospital environment. PLoS One 2018 Oct 23;13(10):e0206006 [https://dx.plos.org/10.1371/journal.pone.0206006] [CrossRef] [Medline]
- Li Y, Liu L, Chiu W, Jian W. Neural network modeling for surgical decisions on traumatic brain injury patients. Int J Med Inform 2000 Jan;57(1):1-9 [CrossRef] [Medline]
- Maurer LR, Bertsimas D, Bouardi HT, El Hechi M, El Moheb M, Giannoutsou K, et al. Trauma outcome predictor: an artificial intelligence interactive smartphone tool to predict outcomes in trauma patients. J Trauma Acute Care Surg 2021 Jul 01;91(1):93-99 [CrossRef] [Medline]
- Shi H, Hwang S, Lee K, Lin C. In-hospital mortality after traumatic brain injury surgery: a nationwide population-based comparison of mortality predictors used in artificial neural network and logistic regression models. J Neurosurg 2013 Apr;118(4):746-752 [CrossRef] [Medline]
- Tsiklidis EJ, Sims C, Sinno T, Diamond SL. Using the National Trauma Data Bank (NTDB) and machine learning to predict trauma patient mortality at admission. PLoS One 2020 Nov 17;15(11):e0242166 [https://dx.plos.org/10.1371/journal.pone.0242166] [CrossRef] [Medline]
- National Emergency Medical Center. URL: https://dw.nemc.or.kr [accessed 2023-08-09]
|AdaBoost: adaptive boosting|
|AI: artificial intelligence|
|AIS: Abbreviated Injury Scale|
|AUROC: area under the receiver operating characteristic curve|
|AVPU: Alert, Verbal, Painful, Unresponsive|
|CT: computed tomography|
|DNN: deep neural network|
|ED: emergency department|
|ERT: extremely random trees|
|GBM: gradient boosting machine|
|GCS: Glasgow Coma Scale|
|GOSS: gradient-based one-side sampling|
|ICD-10: International Classification of Diseases 10th revision|
|ICISS: International Classification of Diseases Severity Score|
|ICU: intensive care unit|
|ISS: Injury Severity Score|
|KTAS: Korean Triage and Acuity Scale|
|LightGBM: light gradient boosting machine|
|LR: logistic regression|
|NEDIS: National Emergency Department Information System|
|RF: random forest|
|ROC: receiver operating characteristic|
|SMOTE: synthetic minority oversampling technique|
|SRR: survival risk ratio|
|TRIPOD: Transparent Reporting of a Multivariable Model for Individual Prognosis or Diagnosis|
|XGBoost: extreme gradient boosting|
Edited by T de Azevedo Cardoso; submitted 24.05.23; peer-reviewed by W Gray, C Jeong, L Hua; comments to author 05.07.23; revised version received 18.07.23; accepted 03.08.23; published 29.08.23Copyright
©Seungseok Lee, Wu Seong Kang, Do Wan Kim, Sang Hyun Seo, Joongsuck Kim, Soon Tak Jeong, Dong Keon Yon, Jinseok Lee. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.08.2023.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.