This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Cardiac surgery–associated acute kidney injury (CSA-AKI) is a major complication following pediatric cardiac surgery, which is associated with increased morbidity and mortality. The early prediction of CSA-AKI before and immediately after surgery could significantly improve the implementation of preventive and therapeutic strategies during the perioperative periods. However, there is limited clinical information on how to identify pediatric patients at high risk of CSA-AKI.
The study aims to develop and validate machine learning models to predict the development of CSA-AKI in the pediatric population.
This retrospective cohort study enrolled patients aged 1 month to 18 years who underwent cardiac surgery with cardiopulmonary bypass at 3 medical centers of Central South University in China. CSA-AKI was defined according to the 2012 Kidney Disease: Improving Global Outcomes criteria. Feature selection was applied separately to 2 data sets: the preoperative data set and the combined preoperative and intraoperative data set. Multiple machine learning algorithms were tested, including K-nearest neighbor, naive Bayes, support vector machines, random forest, extreme gradient boosting (XGBoost), and neural networks. The best performing model was identified in cross-validation by using the area under the receiver operating characteristic curve (AUROC). Model interpretations were generated using the Shapley additive explanations (SHAP) method.
A total of 3278 patients from one of the centers were used for model derivation, while 585 patients from another 2 centers served as the external validation cohort. CSA-AKI occurred in 564 (17.2%) patients in the derivation cohort and 51 (8.7%) patients in the external validation cohort. Among the considered machine learning models, the XGBoost models achieved the best predictive performance in cross-validation. The AUROC of the XGBoost model using only the preoperative variables was 0.890 (95% CI 0.876-0.906) in the derivation cohort and 0.857 (95% CI 0.800-0.903) in the external validation cohort. When the intraoperative variables were included, the AUROC increased to 0.912 (95% CI 0.899-0.924) and 0.889 (95% CI 0.844-0.920) in the 2 cohorts, respectively. The SHAP method revealed that baseline serum creatinine level, perfusion time, body length, operation time, and intraoperative blood loss were the top 5 predictors of CSA-AKI.
The interpretable XGBoost models provide practical tools for the early prediction of CSA-AKI, which are valuable for risk stratification and perioperative management of pediatric patients undergoing cardiac surgery.
An increasing number of pediatric patients worldwide undergo cardiac surgery each year for various reasons, including congenital heart disease and acquired cardiac conditions [
The early prediction of CSA-AKI could significantly improve the implementation of preventive and therapeutic strategies during the perioperative periods. Specifically, preoperative prediction could facilitate surgery risk assessment and prevention of CSA-AKI, and early postoperative prediction could help with the early identification of CSA-AKI for proactive interventions [
The widespread use of machine learning to analyze clinical data derived from electronic health records offers considerable advantages for establishing prediction models. Machine learning is a scientific discipline that uses computer algorithms and learns from data with minimal human intervention [
This study includes patients from 3 distinct medical centers of Central South University in China. The derivation cohort comprised patients admitted at the Second Xiangya Hospital between January 2015 and March 2022. The external validation cohort consisted of patients admitted at Xiangya Hospital between January 2016 and December 2021 and patients admitted at the Third Xiangya Hospital between January 2015 and December 2021.
This study follows the Declaration of Helsinki and the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis statement [
This study includes all pediatric patients aged between 1 month and 18 years who underwent cardiac surgery with cardiopulmonary bypass. We included patients with at least one serum creatinine (SCr) measurement before surgery and another within 7 days after surgery. We excluded patients with congenital renal malformation, preoperative estimated glomerular filtration rate (eGFR) of 15 mL/min/1.73 m2 or lower, or multiple surgeries within 7 days. We calculated eGFR using the modified Schwartz equation, or if body length was missing, using the Full Age Spectrum equation [
Flow diagram of patient selection. eGFR: estimated glomerular filtration rate; SCr: serum creatinine.
Potential predictors included those considered clinically relevant to the development of CSA-AKI and available in the electronic health records, with less than 30% of such observations missing. Potential predictors were divided into preoperative and intraoperative variables. Preoperative variables included patient demographics, preoperative conditions, laboratory tests, and medications. Preoperative conditions were determined according to the diagnosis on admission, preoperative diagnosis, and preoperative anesthesia interview records. The most recent preoperative measurements were used for laboratory variables. Medications were classified based on the Anatomic Therapeutic Chemical classification system and included if administered within 7 days before the surgery [
The primary outcome was the development of CSA-AKI, which was determined according to the 2012 Kidney Disease: Improving Global Outcomes (KDIGO) clinical practice guideline [
Descriptive statistics are presented as medians and interquartile ranges for continuous variables and as numbers and percentages for categorical variables. Data distributions were compared using the Mann-Whitney
Predictor variables with near-zero variance, identified as those with the percentages of unique values less than 5%, were removed from the analysis. Subsequently, 4 feature selection methods were used to obtain subsets of the predictor variables for further model development. The methods included Least Absolute Shrinkage and Selection Operator, Boruta algorithm, random forest-recursive feature elimination, and random forest-filtering. The results obtained by the 4 methods were comprehensively evaluated, and the predictor variables that appeared more than 3 times among the 4 methods were ultimately selected to build the model. Feature selection was conducted twice—first to include only the preoperative variables and then by combining the preoperative and intraoperative variables. The glmnet, Boruta, and caret packages in R were used for feature selection.
For model development, the following machine learning algorithms were applied to the preoperative-only and the combined data sets of the derivation cohort: K-nearest neighbor, naive Bayes, support vector machines, random forest, extreme gradient boosting (XGBoost), and neural networks. We conducted 5 random shuffles of 5-fold cross-validation to ensure an unbiased assessment of model performance and to identify the optimal hyperparameters for each model. Model performance was assessed based on the mean area under the receiver operating characteristic curves (AUROCs) from 5×5 iterations. After that, the best performing machine learning model was chosen for each data set. The caret package in R was used for model development. The details on functions, packages, and tuning parameters used for each machine learning algorithm are provided in Table S2 of
The performance of the final prediction models was further evaluated in the external validation cohort. The metrics for model performance included AUROC, the area under the precision-recall curve (AUPRC), and the calibration plot. AUROC was used as the primary performance metric because it is independent of the thresholds in the setting of class imbalance. AUPRC is known to be more informative for class-imbalanced prediction tasks because it is sensitive to changes in the number of false-positive predictions [
The Shapley additive explanations (SHAP) method was used to explore the interpretability of the final prediction models. SHAP is a unified approach to the interpretations of model predictions, which provides consistent and locally accurate attribution values for each feature within a prediction model, namely, the SHAP values [
Sensitivity analysis was performed to examine the predictive power of the models for CSA-AKI stages 2-3. Model performance was also evaluated in the subgroups, focusing on patients in different age groups (infancy: 1 month to 1 year; childhood: 2-10 years; adolescence: 11-18 years) [
A total of 3863 participants were enrolled in this study, that is, 3278 in the derivation cohort and 585 in the external validation cohort. The baseline characteristics and outcomes of the patients in the derivation and external validation cohorts are shown in
Baseline characteristics and outcomes of the patients in the derivation and external validation cohorts.
Variables | Derivation cohort (n=3278) | External validation cohort (n=585) | |
|
|||
|
Age (year), median (IQR) | 1 (0.5-4) | 4 (1-8) |
|
Sex (male), n (%) | 1709 (52.1) | 288 (49.2) |
|
Body length (cm), median (IQR) | 79 (65-105) | 100 (80-128) |
|
Weight (kg), median (IQR) | 9.5 (6.0-16.0) | 15.0 (10.0-22.5) |
|
|||
|
Type A | 1100 (33.6) | 200 (34.4) |
|
Type B | 744 (22.7) | 134 (23) |
|
Type O | 1182 (36.1) | 197 (33.8) |
|
Type AB | 252 (7.7) | 51 (8.8) |
|
|||
|
Cyanotic heart disease, n (%) | 740 (22.6) | 113 (19.3) |
|
Pulmonary hypertension, n (%) | 1763 (53.8) | 241 (41.2) |
|
Pulmonary infection, n (%) | 293 (8.9) | 30 (5.1) |
|
Infective endocarditis, n (%) | 44 (1.3) | 9 (1.5) |
|
Previous cardiac surgery, n (%) | 181 (5.5) | 21 (3.6) |
|
Genetic disease, n (%) | 72 (2.2) | 12 (2.1) |
|
Noncardiac malformation, n (%) | 111 (3.4) | 15 (2.6) |
|
Preoperative intensive care, n (%) | 168 (5.1) | 10 (1.7) |
|
Preoperative length of stay (day), median (IQR) | 4 (2-7) | 6 (3-7) |
|
|||
|
Ⅰ | 21 (0.6) | 0 (0) |
|
Ⅱ | 589 (18) | 120 (21.2) |
|
Ⅲ | 1978 (60.6) | 339 (59.8) |
|
Ⅳ | 665 (20.4) | 108 (19) |
|
V | 12 (0.4) | 0 (0) |
|
|||
|
Baseline creatinine (µmol/L), median (IQR) | 25.3 (20.4-33.5) | 44.0 (36.0-53.0) |
|
Baseline estimated glomerular filtration rate (mL/min/1.73 m2), |
118.0 (99.5-138.0) | 85.0 (72.4-97.1) |
|
Left ventricular ejection fraction (%), median (IQR) | 69 (66-73) | 66 (62-70) |
|
Hemoglobin (g/L), median (IQR) | 119 (107-129) | 125 (117-135) |
|
Red blood cell distribution width (%), median (IQR) | 13.4 (12.7-14.7) | 13.6 (12.9-14.7) |
|
White blood cells (×109/L), median (IQR) | 8.0 (6.5-9.8) | 7.6 (6.3-9.5) |
|
Platelets (×109/L), median (IQR) | 314 (253-383) | 271 (223-326) |
|
Dipstick albuminuria, n (%) | 57 (2.1) | 4 (0.8) |
|
Blood urea nitrogen (mmol/L), median (IQR) | 4.08 (2.96-5.12) | 4.16 (3.22-5.05) |
|
Total bilirubin (µmol/L), median (IQR) | 7.4 (5.1-10.9) | 7.3 (5.1-11.0) |
|
Alanine aminotransferase (U/L), median (IQR) | 16.4 (11.7-25.3) | 14.2 (11.1-19.0) |
|
Aspartate aminotransferase (U/L), median (IQR) | 34.9 (27.4-45.4) | 31.4 (25.6-39.1) |
|
Albumin (g/L), median (IQR) | 40.3 (38.2-42.3) | 43.1 (41.0-45.4) |
|
Potassium (mmol/L), median (IQR) | 4.80 (4.46-5.12) | 4.45 (4.19-4.73) |
|
Sodium (mmol/L), median (IQR) | 138.3 (137.0-139.7) | 140.0 (138.7-141.1) |
|
Chloride (mmol/L), median (IQR) | 103.2 (101.6-104.7) | 104.1 (102.7-105.5) |
|
Calcium (mmol/L), median (IQR) | 2.40 (2.32-2.49) | 2.46 (2.37-2.54) |
|
|||
|
Iodinated contrast media | 411 (12.5) | 73 (12.5) |
|
Digoxin | 104 (3.2) | 11 (1.9) |
|
Diuretics | 316 (9.6) | 38 (6.5) |
|
Nonsteroidal anti-inflammatory drugs | 54 (1.6) | 12 (2.1) |
|
Angiotensin converting enzyme inhibitor/angiotensin Ⅱ receptor blocker | 61 (1.9) | 4 (0.7) |
|
Nephrotoxic antibiotics | 61 (1.9) | 1 (0.2) |
|
Antiviral drugs | 163 (5) | 1 (0.2) |
|
|||
|
Emergent surgery, n (%) | 168 (5.1) | 14 (2.4) |
|
Operation time (min), median (IQR) | 155 (129-197) | 190 (165-230) |
|
Perfusion time (min), median (IQR) | 58 (44-84) | 63 (46-90) |
|
Cross clamp time (min), median (IQR) | 33 (23-51) | 37 (23-55) |
|
Cardioversion, n (%) | 279 (8.5) | 73 (12.5) |
|
Lowest mean arterial pressure (mmHg), median (IQR) | 35 (31-40) | 38 (31-45) |
|
Lowest core temperature (°C), median (IQR) | 33.3 (31.8-34.4) | 33.3 (31.7-34.7) |
|
Intraoperative blood loss (mL/kg), median (IQR) | 21.4 (14.8-31.3) | 20.0 (15.4-26.2) |
|
Intraoperative fluid balance (%), median (IQR) | –0.7 (–1.9 to 0.1) | 1.6 (–0.2 to 2.9) |
|
|||
|
1 | 480 (14.9) | 92 (16.2) |
|
2 | 2032 (63) | 386 (67.8) |
|
3 | 653 (20.3) | 84 (14.8) |
|
4 | 59 (1.8) | 7 (1.2) |
|
|||
|
Acute kidney injury, n (%) | 564 (17.2) | 51 (8.7) |
|
Acute kidney injury stages 2-3, n (%) | 208 (6.3) | 26 (4.4) |
|
In-hospital mortality, n (%) | 38 (1.2) | 5 (0.9) |
|
Intensive care unit length of stay (day), median (IQR) | 2 (1-3) | 1 (1-2) |
|
Hospital length of stay (day), median (IQR) | 8 (7-13) | 8 (7-9) |
A total of 25 preoperative variables were selected as predictors of CSA-AKI by the 4 feature selection methods and included in the machine learning models (Table S5 of
Among the considered machine learning models, the XGBoost model achieved the best performance on both the preoperative-only and the combined data sets, with a mean AUROC of 0.795 and 0.832 in cross-validation, respectively (Table S7 of
Receiver operating characteristic curves of the extreme gradient boosting and traditional logistic regression models. (A-B) Receiver operating characteristic curves of the models with only the preoperative variables in the (A) derivation and (B) external validation cohorts. (C-D) Receiver operating characteristic curves of the models with the preoperative and intraoperative variables in the (C) derivation and (D) external validation cohorts. AUC: area under the curve; XGBoost: extreme gradient boosting.
The SHAP summary plots of the XGBoost models are shown in
Shapley additive explanations summary plots of the extreme gradient boosting models for cardiac surgery–associated acute kidney injury. (A) Shapley additive explanations summary plot of the extreme gradient boosting model with only the preoperative variables. (B) Shapley additive explanations summary plot of the extreme gradient boosting model with the preoperative and intraoperative variables. A dot is created for each feature attribution in calculating the output risk for each observation. ALT: alanine aminotransferase; ASA: American Society of Anesthesiologists; AST: aspartate aminotransferase; eGFR: estimated glomerular filtration rate; MAP: mean arterial pressure; RACHS: Risk Adjustment for Congenital Heart Surgery; RDW: red blood cell distribution width; SHAP: Shapley additive explanations.
The XGBoost models showed good predictive performance for CSA-AKI stages 2-3, with AUROCs higher than 0.85 in both the derivation and the external validation cohorts, respectively (Figure S5 of
Diagnostic test characteristics of the extreme gradient boosting models at the low- and high-risk cutoff points.
Models, cohorts | Cutoff |
Sensitivity (%) | Specificity (%) | Positive predictive value (%) | Negative predictive value (%) | Positive likelihood ratio | Negative likelihood ratio | |||||||||
|
||||||||||||||||
|
|
|||||||||||||||
|
|
Low-risk cutoff | 0.103 | 95 | 52.7 | 29.4 | 98.1 | 2.01 | 0.09 | |||||||
|
|
High-risk cutoff | 0.365 | 55.3 | 95 | 69.8 | 91.1 | 11.12 | 0.47 | |||||||
|
|
|||||||||||||||
|
|
Low-risk cutoff | 0.103 | 84.3 | 80.1 | 28.9 | 98.2 | 4.25 | 0.20 | |||||||
|
|
High-risk cutoff | 0.365 | 7.8 | 99.8 | 80 | 91.9 | 41.88 | 0.92 | |||||||
|
||||||||||||||||
|
|
|||||||||||||||
|
|
Low-risk cutoff | 0.099 | 95 | 58.3 | 32.2 | 98.3 | 2.28 | 0.09 | |||||||
|
|
High-risk cutoff | 0.374 | 60.3 | 95 | 71.6 | 92 | 12.12 | 0.42 | |||||||
|
|
|||||||||||||||
|
|
Low-risk cutoff | 0.099 | 80.4 | 80.5 | 28.3 | 97.7 | 4.13 | 0.24 | |||||||
|
|
High-risk cutoff | 0.374 | 27.5 | 98.7 | 66.7 | 93.4 | 20.94 | 0.74 |
In this multicenter retrospective study, we developed and externally validated prediction models for pediatric CSA-AKI by using machine learning approaches. Multiple machine learning algorithms were tested in the process of model development, with the XGBoost algorithm ultimately identified as offering the strongest discrimination. In addition, the XGBoost models showed promising predictive performance on both the preoperative-only and combined data sets, demonstrating their potential usefulness for predicting pediatric CSA-AKI. To the best of our knowledge, our study is the first to establish machine learning models for CSA-AKI in the pediatric population that are valuable for risk stratification and clinical decision-making.
Previous studies have shown the advantages of machine learning algorithms in predicting CSA-AKI in adults [
Both preoperative and intraoperative factors proved to contribute to the prediction of postoperative AKI. Tseng et al [
The SHAP method was used to uncover the black box of the XGBoost models. This method is a model-agnostic explanation technique that has been widely used to interpret the contribution of predictors to the model output [
Our findings have significant clinical implications. First, the low- and high-risk cutoff values were identified to promote the clinical application of the XGBoost models. This should allow the care team to identify the patients at high risk of CSA-AKI and to develop optimal perioperative management strategies. Second, our models used the preoperative and intraoperative variables that are routinely collected in clinical practice, thus adding no extra laboratory tests or financial burdens to the standard clinical care procedures. Third, the discovery of modifiable predictors may promote early interventions to mitigate the risk of CSA-AKI.
Our study has several limitations. First, data were retrospectively collected from electronic health records. Second, the study population was restricted to tertiary medical institutions, as pediatric cardiac surgery is typically not offered in primary health care institutions in China. Thus, the applicability of our prediction models needs further validation in diverse populations. Third, the urine output criteria were not used to define CSA-AKI because hourly urine output data were not available for most patients. However, given the routine use of diuretics in the intraoperative and postoperative periods to maintain urine output, few patients with CSA-AKI were missed in this study. Finally, the causality between the predictors and CSA-AKI needs further exploration. Randomized controlled trials would need to be performed to verify whether the modification of certain predictors can prevent the occurrence of CSA-AKI.
Our study demonstrates the applicability of machine learning approaches in predicting the development of CSA-AKI in the pediatric population. The XGBoost models had consistent and clinically applicable performance in the derivation and external validation cohorts, which indicated their robustness and expandability. Additionally, the predictive value of the preoperative and intraoperative factors was demonstrated by the improved performance of the model when these factors were combined. Ultimately, our models should prove useful in assisting practitioners with risk stratification and clinical decision-making in pediatric patients undergoing cardiac surgery.
Data types and missing values for the variables of interest.
Functions, packages, and tuning parameters used for each machine learning algorithm.
Illustration of the framework of model establishment.
Baseline characteristics and outcomes of patients with and without cardiac surgery–associated acute kidney injury in the derivation and external validation cohorts.
Predictor variables selected by the 4 feature selection methods.
Performance of the machine learning models in cross-validation.
Performance of the extreme gradient boosting models for cardiac surgery–associated acute kidney injury.
Shapley additive explanations dependence plots for the association between the predictors and cardiac surgery–associated acute kidney injury.
Receiver operating characteristic curves of the extreme gradient boosting models for cardiac surgery–associated acute kidney injury stages 2-3.
Receiver operating characteristic curves of the extreme gradient boosting models for cardiac surgery–associated acute kidney injury in the subgroups.
Receiver operating characteristic curves of the extreme gradient boosting models for cardiac surgery–associated acute kidney injury trained on the balanced derivation cohort.
area under the precision-recall curve
area under the receiver operating characteristic curve
cardiac surgery–associated acute kidney injury
estimated glomerular filtration rate
Kidney Disease: Improving Global Outcomes
Risk Adjustment for Congenital Heart Surgery-1
serum creatinine
Shapley additive explanations
extreme gradient boosting
The authors would like to acknowledge Rui-Xue Ma (Xiangya Hospital of Central South University) and Qiong-qiong Wu (the Second Xiangya Hospital of Central South University) for their help in data collection. The authors would like to express their gratitude to EditSprings for the expert linguistic services provided.
SBD designed and supervised this study and revised this manuscript. XQL and YXK performed data extraction, analyzed and interpreted the data, and drafted the manuscript. PY and GBS interpreted the data and critically revised the manuscript. NYZ, SKY, JXL, and HZ performed data extraction and revised the manuscript critically for important intellectual content. All authors read and approved the final manuscript.
None declared.