This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Kidney transplantation is the optimal treatment for patients with end-stage renal disease. Short- and long-term kidney graft survival is influenced by a number of donor and recipient factors. Predicting the success of kidney transplantation is important for optimizing kidney allocation.
The aim of this study was to predict the risk of kidney graft failure across three temporal cohorts (within 1 year, within 5 years, and after 5 years following a transplant) based on donor and recipient characteristics. We analyzed a large data set comprising over 50,000 kidney transplants covering an approximate 20-year period.
We applied machine learning–based classification algorithms to develop prediction models for the risk of graft failure for three different temporal cohorts. Deep learning–based autoencoders were applied for data dimensionality reduction, which improved the prediction performance. The influence of features on graft survival for each cohort was studied by investigating a new nonoverlapping patient stratification approach.
Our models predicted graft survival with area under the curve scores of 82% within 1 year, 69% within 5 years, and 81% within 17 years. The feature importance analysis elucidated the varying influence of clinical features on graft survival across the three different temporal cohorts.
In this study, we applied machine learning to develop risk prediction models for graft failure that demonstrated a high level of prediction performance. Acknowledging that these models performed better than those reported in the literature for existing risk prediction tools, future studies will focus on how best to incorporate these prediction models into clinical care algorithms to optimize the long-term health of kidney recipients.
Kidneys are vital for the health of an individual, as they filter waste products from the blood and produce hormones and urine [
Kidney transplantation [
Prediction modeling using machine learning (ML) algorithms has gained attention in recent years [
In this study, the intent is to investigate kidney transplant allograft survival, that is, estimating the time-to-event and the evolving influence of clinical features leading to an event—within three temporal cohorts of 1 year, >1-5 years, and >5 years of a kidney transplant. We predicted the outcome of graft failure after kidney transplant based on the analysis of donor and recipient features. We applied ML methods to (1) predict the graft status over different temporal periods and (2) analyze the changing effect of donor-recipient–related predictors across different periods. To develop the prediction models, we analyzed a large data set of over 50,000 transplants covering approximately a 20-year period of kidney transplants in the United States. To generate the clinically meaningful temporal cohorts, we experimented with two patient stratification approaches: (1) a novel
The contributions of this research are as follows: (1) ML-based prediction models that are trained on a large data set, offering improved prediction performance compared with previous studies (previous graft prediction studies are based on a smaller number of transplants over a shorter period); (2) data dimensionality reduction based on a deep learning framework to handle the high-dimensional and complex kidney transplant data set; (3) a novel nonoverlapping patient stratification approach to provide fine-grained feature importance within a specific period while avoiding bias from preceding cohorts; (4) explaining the influence of the different clinical features, during different periods, toward the prediction performance of ML prediction models. This finding allows the selection of the most important features to predict graft outcomes within a specific temporal window; and (5) a comparison between the two stratification approaches with respect to the performance of the prediction models. The future practical outcome of this study is the provision of a data-driven decision support tool to assist nephrologists in the kidney allocation process by identifying the best donor and recipient pair that will lead to the highest likelihood of graft survival for a given recipient.
Patients can receive a kidney from either deceased donors or living donors. The donor-recipient matching process becomes relatively more complex with deceased donors because of the need to account for additional clinical factors (ie, prolonged cold ischemia time, prolonged wait times, and generally lower quality organs) [
Data-driven methods are now being used for organ matching; these methods are used to establish clinical compatibility beyond the blood group and tissue type. Conventional data-driven prediction methods use statistical techniques such as Cox proportional hazard models and Kaplan-Meier estimates to perform time-to-event analyses [
ML-based data analysis to develop prediction models for predicting outcomes is usually performed using classification methods, whereas regression methods are used for time-to-event analysis. There are two prominent approaches to predict kidney allograft outcomes using ML-based classification methods. The first approach is to predict graft survival over time by dividing a longitudinal data set into different time cohorts based on the occurrence of a given adverse event or the last follow-up date from the date of transplant. Each time cohort has a binary target variable, that is, success or failure of the graft, which is used to train the classification model to predict graft survival [
Due to the high dimensionality of existing data sets for organ transplantation, feature selection is applied to filter out redundant features. A stacked autoencoder, which is an unsupervised neural network, is an efficient dimensionality reduction technique with promising performance for deep representation of medical data [
Right-censored data are a common problem for survival analysis, as it represents cases for which the adverse event is not available or recorded because of either the subject having been lost to follow-up or not experiencing the event during the study period. Multiple approaches have been adopted in previous studies to address this problem. The study on kidney transplants by Topuz et al [
The influence of clinical features (or clinical predictors) on graft survival tends to vary over time [
In previous studies [
This study is organized into five major sections:
To predict graft survival over time and to analyze the influence of clinical features on graft survival, our data analytics methodology (
Overview of our data analytics methodology. AUC: area under the curve; SMOTE: synthetic minority oversampling technique; UNOS: united network of organ sharing.
This study used data from the Scientific Registry of Transplant Recipients (SRTR). The SRTR data system includes data on all donors, wait-listed candidates, and transplant recipients in the United States, submitted by the members of the Organ Procurement and Transplantation Network. The Health Resources and Services Administration and the US Department of Health and Human Services provided an overview of the activities of the Organ Procurement and Transplantation Network and SRTR contractors.
The data set provided pretransplant clinical features and outcomes of 277,316 kidney transplants between 2000 and 2017. Survival was reported in terms of graft outcome and patient status. For the purposes of this study, graft failure was defined as (1) graft loss or (2) death with a functioning graft.
We analyzed the data and used only complete cases (ie, no missing feature values), which comprised a total of 52,827 kidney transplants.
List of clinical features used to train the prediction models.
Feature description | Data type | Abbreviation |
Peak panel reactive antibody | Continuous | PKPRA |
Type of transplant | Categorical | REC_TX_PROCEDURE |
Any previous kidney transplant | Categorical | PREVKI |
Donor age | Continuous | DAGE |
Donor height | Continuous | DHT100 |
Recipient height | Continuous | RHT2100 |
Donor weight | Continuous | DWT |
Recipient weight | Continuous | RWT2 |
Donor creatinine level | Continuous | DONCREAT |
Expanded criteria donor | Categorical | ECD |
Donation after cardiac death | Categorical | DCD |
Donor hypertension | Categorical | DHTN2 |
Recipient hypertension | Categorical | RHTN |
Recipient BMI | Continuous | RBMI2 |
Donor BMI | Continuous | DBMI |
Cold ischemia time | Continuous | CIT |
Recipient age | Continuous | RAGETX |
Number of HLA antigen mismatches (paired) | Categorical | HLAMM |
Functional status of the recipient | Categorical | FUNCTSTAT |
Donor-recipient sex (paired) | Categorical | DRSEX |
Donor-recipient race (paired) | Categorical | DRRACE |
Donor-recipient age (paired) | Categorical | DRAGE |
Recipient cardiovascular disease | Categorical | RCVD |
Donor hepatitis C virus | Categorical | DHCV |
Recipient peripheral vascular disease | Categorical | RPVD |
Donor race | Categorical | DRACESIMP |
Recipient race | Categorical | RRACESIMP |
Recipient malignancy | Categorical | RMALIG |
Years on dialysis pretransplant | Continuous | VINTAGE |
Donor diabetes | Categorical | DDM |
Preemptive transplant | Categorical | PREEMPTIVE |
Recipient diabetes | Categorical | RDM2 |
Recipient coronary artery disease | Categorical | RCAD |
Simplified ESRDa diagnosis | Categorical | ESRDDXSIMP |
Donor-recipient CMVb (paired) | Categorical | DRCMV |
Donor-recipient height difference | Categorical | AHD1 |
Donor-recipient weight difference | Categorical | DRWT |
aESRD: end-stage renal disease.
bCMV: cytomegalovirus.
Data preparation for learning the ML-based prediction models consisted of data cleaning, partitioning the data set into temporal cohorts, and addressing class imbalances.
Data cleaning involved removing (1) all patient identifying features (such as transplant ID, donor ID, and patient ID) [
Given the longitudinal data set, we generated two distinct data sets using traditional
Derivation of the overlapped cohorts.
Derivation of the nonoverlapped cohorts.
The overlapping patient stratification approach (used in previous graft status prediction studies) provides a cumulative analysis of graft outcomes up to a specific time point. In our study, the overlapping data stratification resulted in the following three cohorts: cohort 1, spanning from year0 to year1, which reported graft outcomes (ie, graft failure or survival) during this period; cohort 2, which reported graft status from year0 to year5 and overlapped with cohort 1 such that it included the patients in cohort 1
Our nonoverlapping patient stratification approach yielded three cohorts: cohort 1
When partitioning the data into cohorts, we accounted for the presence of censored data, that is, the lack of information about the occurrence of an adverse event for a surviving patient. There is no concrete method to determine survivors when confronted with censored data. We initially assumed that those patients who did not fail in a certain cohort could be presumed as survivors. However, this assumption led to two problems: (1) it included censored patients who might have experienced graft failure during the study, and (2) it led to a severe class imbalance between the graft failure and surviving patients. To overcome these problems, we took a two-phase heuristic approach to remove the censored observations to identify
The first part of this equation illustrates the first phase of the proposed approach. The
Number of failed and survived transplants in overlapped and nonoverlapped cohorts.
Cohort | Overlapping | Nonoverlapping | |||||
|
Count, n | Failed, n (%) | Survived, n (%) | Count, n | Failed, n (%) | Survived, n (%) | |
1 | 52,827 | 7554 (14.3) | 45,273 (85.7) | 52,827 | 7554 (14.3) | 45,273 (85.7) | |
2 | 52,827 | 23,475 (44.44) | 29,352 (55.56) | 45,273 | 15,921 (35.17) | 29,352 (64.83) | |
3 | 52,827 | 37,939 (71.82) | 14,888 (28.18) | 29,352 | 14,464 (49.28) | 14,888 (50.72) |
Our data set had two outcomes: the presence or absence of graft failure. There was a significant class imbalance whereby the
This step involved both the removal and construction of features with the intent to reduce the dimensionality of the feature space.
A set of paired features was constructed by combining the related features. Typically, graft predictions use discrete individual donor and recipient features. We examined the underlying correlation between the donor and recipient features and paired the highly related features to generate new
We transformed the categorical features into multiple dummy features to make them compatible with the stacked autoencoders, which cannot process categorical features. In addition, it was also a necessary operation because of the functional constraint of the scikit-learn library [
Finally, we used 86 transformed categorical features as inputs to the stacked autoencoders for feature reduction to subsequently train the ML prediction models. Continuous features were also initially considered as a part of the input vector to stacked autoencoders (Table S1 in
After testing with different configuration settings provided in the Keras framework [
Prediction was pursued as a binary classification problem, where the prediction output represents the graft outcome for a given patient in terms of the class label, graft failure or survived. We investigated four different ML-based classification models for each time cohort (ie, cohorts 1-3). Given that logistic regression (LR) has been widely used in prior studies to develop graft prediction models [
All classification models were trained using a 10-fold stratified cross-validation training approach. The stratification ensured that outcome class ratio in each fold is maintained to avoid any sampling bias that may affect the classification results. We mainly used the scikit-learn library [
Algorithmic settings for the classifiers.
Method, Hyperparameter | Values | |
|
||
|
Number of estimators | 200 |
|
Class weight | Balanced |
|
Criterion | Gini |
|
Maximum depth | 9 |
|
Minimum samples split | 2 for cohort 1; 3 for the rest |
|
Maximum features | 14 |
|
||
|
Cb | 50 |
|
Gamma | Auto, scale |
|
Decision function shape | One versus rest |
|
Kernel | Radial |
|
||
|
Solver | Adam |
|
Learning rate | Adaptive |
|
Activation | Logistic |
|
Alpha | 1e-2,1e-6 |
|
Hidden layers | |
|
||
|
Base learner | RF |
|
Number of estimators | 401 |
|
Learning rate | 1 |
|
Algorithm | Samme.R |
|
||
|
Penalty | l2 |
|
C | 10 |
|
Class weight | Balanced |
|
Max iteration | 1000 |
|
Solver | Sag |
aRF: random forest.
bC: regularization parameter.
Random forest (RF) was used as both a standalone classifier and a base learner for the adaptive boosting (AdaBoost) algorithm. It has been widely used to predict survival data [
The AdaBoost algorithm was applied to two weak learners, RF and LR. The study by Thongkam et al [
A backpropagation algorithm was used to train a neural network–based binary classifier. Generally, artificial neural networks (ANNs) perform well on survival data sets [
Classification models using support vector machines (SVMs) have been applied to predict survival data [
The nonoverlapped time cohorts were used to calculate the feature importance scores to understand the changing relevance of features over time. We calculated these scores by training an RF classifier on the complete data set. The scores were calculated using Gini. Feature influence scores were used to understand the effect of features over the three cohorts.
Below, we present the prediction performance of the four ML classifiers using both overlapped and nonoverlapped cohorts. As LR has been extensively used to predict time-to-event in organ transplant studies [
Area under the curve comparison—all features with auto-encoded features.
Cohort | Overlapped | Nonoverlapped | |||||||
|
All features (%) | Continuous+auto-encoded features (%) | All features (%) | Continuous+auto-encoded features (%) | |||||
|
|||||||||
|
SVMa | 80 | 82 | N/Ab | N/A | ||||
|
AdaBoostc | 76 | 78 | N/A | N/A | ||||
|
RFd | 68 | 70 | N/A | N/A | ||||
|
ANNe | 62 | 61 | N/A | N/A | ||||
|
LRf | 62 | 62 | N/A | N/A | ||||
|
|||||||||
|
SVM | 63 | 66 | 53 | 53 | ||||
|
AdaBoost | 67 | 69 | 64 | 60 | ||||
|
RF | 62 | 65 | 65 | 67 | ||||
|
ANN | 62 | 62 | 62 | 62 | ||||
|
LR | 62 | 62 | 64 | 61 | ||||
|
|||||||||
|
SVM | 73 | 80 | 68 | 65 | ||||
|
AdaBoost | 76 | 81 | 68 | 64 | ||||
|
RF | 72 | 75 | 68 | 66 | ||||
|
ANN | 73 | 72 | 68 | 65 | ||||
|
LR | 69 | 69 | 62 | 64 |
aSVM: support vector machine.
bN/A: not applicable.
cAdaBoost: adaptive boosting.
dRF: random forest.
eANN: artificial neural network.
fLR: logistic regression.
The AUC scores (
Although ANN and LR (the baseline model) showed no significant improvement across all three cohorts, the results confirmed the effectiveness of our deep learning architecture of stacked autoencoders for feature selection. For the subsequent prediction modeling analysis, we used the reduced feature set.
Prediction performance of the machine learning classifiers across three different temporal cohorts using the overlapped patient stratificationa.
Cohort | Auto-encoded feature set | ||||||||
|
AUCb (%), mean (SD) | F1 (%), mean (SD) | Recall (%), mean (SD) | Precision (%), mean (SD) | |||||
|
|||||||||
|
|
|
|
|
|
||||
|
AdaBoostd | 78 (0.01) | 56 (0.01) | 95 (0.01) | 35 (0.01) | ||||
|
RFe | 70 (0.009) | 45 (0.001) | 47 (0.01) | 41 (0.01) | ||||
|
ANNf | 61 (0.01) | 5 (0.001) | 42 (0.01) | 6 (0.004) | ||||
|
LRg | 62 (0.008) | 39 (0.009) | 58 (0.01) | 29 (0.04) | ||||
|
|||||||||
|
SVM | 66 (0.006) | 53 (0.01) | 55 (0.01) | 60 (0.01) | ||||
|
|
|
|
|
|
||||
|
RF | 65 (0.009) | 62 (0.01) | 62 (0.01) | 61 (0.01) | ||||
|
ANN | 63 (0.007) | 60 (0.04) | 55 (0.09) | 60 (0.01) | ||||
|
LR | 62 (0.008) | 59 (0.009) | 58 (0.01) | 60 (0.004) | ||||
|
|||||||||
|
SVM | 80 (0.005) | 83 (0.003) | 76 (0.003) | 96 (0.003) | ||||
|
|
|
|
|
|
||||
|
RF | 75 (0.008) | 75 (0.006) | 75 (0.01) | 73 (0.01) | ||||
|
ANN | 72 (0.007) | 68 (0.005) | 81 (0.03) | 69 (0.01) | ||||
|
LR | 69 (0.001) | 77 (0.009) | 70 (0.01) | 70 (0.001) |
aItalics show the classifiers with the highest performance among the three cohorts.
bAUC: area under the curve.
cSVM: support vector machine.
dAdaBoost: adaptive boosting.
eRF: random forest.
fANN: artificial neural network.
gLR: logistic regression.
The classifiers performed differently across the three cohorts—SVM offered the highest prediction performance for short-term predictions, that is, for cohort 1, whereas AdaBoost offered the highest performance for the remaining cohorts. The SD across the different folds was nominal, confirming the stability of the classifiers.
Receiver operating characteristic curves for support vector machine, adaptive boosting, and adaptive boosting for the three cohorts, respectively (left to right). AUC: area under the curve; ROC: receiver operating characteristic.
To further investigate the prediction efficacy of the ML-based classifiers, we evaluated the prediction performance of the best-performing classifier for all three cohorts by testing the prediction of graft failure events by a classifier trained for a specific cohort with data from other cohorts, that is, testing the classifier for cohort 2 with randomly selected data from cohorts 1 and 3. The underlying assumption is that the classifier should not produce good prediction results for data from other cohorts. As this evaluation considers survivors across progressive cohorts, we used the F1 score to measure prediction performance. A sound prediction model for cohort 2 will give a high graft failure prediction score for data from cohort 1 but a low prediction score for data from cohort 3, the rationale being that the overlapping cohort 2 classifier is trained on graft failure cases in both cohorts 1 and 2
Prediction performance (F1 scores) for cross-cohort predictions using overlapped cohorts.
Model | Cohort 1 | Cohort 2 | Cohort 3 |
SVMa (cohort 1) | 0.6 | 0.42 | 0.29 |
AdaBoostb (cohort 2) | 0.79 | 0.87 | 0.58 |
AdaBoost (cohort 3) | 0.72 | 0.75 | 0.87 |
aSVM: support vector machine.
bAdaBoost: adaptive boosting.
To determine if the prediction differences between the different models were statistically significant, we used the Wilcoxon signed-rank test to compare the scores between different models. Because the best scores in each cohort were usually produced by SVM and AdaBoost models, the Wilcoxon signed-rank test was conducted with each combination of these models with the other models.
The results for Wilcoxon signed-rank test (F1).
Cohort | |||||
|
SVMa-AdaBoostb,c | SVM-ANNd,e | SVM-RFf,g | AdaBoost-RFh | AdaBoost-ANNi |
Cohort 1 | .003 | .003 | .003 | .003 | .003 |
Cohort 2 | .003 | .003 | .003 | .003 | .03 |
Cohort 3 | <.001 | <.001 | <.001 | <.001 | <.001 |
aSVM: support vector machine.
bAdaBoost: adaptive boosting.
cHo (null hypothesis): SVM=AdaBoost; Ha (alternative hypothesis): SVM≠AdaBoost.
dANN: artificial neural network.
eHo: SVM=EANN; Ha: SVM≠EANN.
fRF: random forest.
gHo: SVM=RF; Ha: SVM≠RF.
hHo: AdaBoost=RF; Ha: AdaBoost≠RF.
iHo: AdaBoost=ANN; Ha: AdaBoost≠ANN.
The second objective of this research is to analyze the influence of clinical features on the prediction of graft survival over different periods. The intent was to understand the factors responsible for graft survival at different periods after transplant. The nonoverlapped cohorts (0-1 years, >1-5 years, and >5-17 years following a transplant) were used to ensure that there was no cascading influence of the features over time. For comparison purposes, we also examined the feature importance for overlapping cohorts. The feature importance scores represent the relative importance of the feature among all features, that is, the total of all the features’ importance scores add up to 100%; hence, if one feature gains a higher importance score, it will be at the expense of the importance score of other features.
Changing relevance of features based on nonoverlapped time cohorts.
Changing relevance of features based on overlapped time cohorts.
In general, the top 10% of the important features remained consistent in both the nonoverlapped and overlapped cohorts; however, we note that the nonoverlapped cohorts identified a larger group of important features. For instance, peak panel reactive antibody (Pkpra) and pre-emptive recipient status (Preemptive) had negligible importance in overlapping cohorts but were important during the 2-5 years and 6-17 years in the nonoverlapping cohorts.
Ranking of the top-10 features across the time cohorts with feature importance scoresa.
Rank | Cohort 1, feature, relative score (%) | Cohort 2 | Cohort 3 | ||
|
|
Feature, relative score (%) | Importance (%), rank change | Feature, relative score (%) | Importance (%), rank change |
1 | HLAMMb (13) | ESRDDXSIMPc (18) |
|
RAGETXd (16) | +77, |
2 | VINTAGEe (12) | VINTAGE (11) | –8, |
RDM2f (16) | >+100, |
3 | ESRDDXSIMP (9) | HLAMM (10) | –23, – |
ESRDDXSIMP (13) | –27, – |
4 | DRCMVg (8) | RAGETX (9) | +78, |
FUNCTSTATh (12) | >+100, |
5 | DRRACEi (8) | DAGEj (6) | 50, |
DAGE (5) | –16, |
6 | FUNCTSTAT (7) | DRRACE (4) | –100, – |
DRRACE (3) | –25, |
7 | DAGE (4) | FUNCTSTAT (4) | –75, – |
ECDk (3) | >+100, |
8 | RCADl (4) | PKPRAm (3) | +50, |
VINTAGE (2) | >–100, – |
9 | RDM2 (3) | PREEMPTIVEn (3) | +50, |
PREEMPTIVE (2) | –33, |
10 | RAGETX (2) | DRCMV (2) | –75, – |
RWT2 (2) | 0, |
Rest | Rest (31) | Rest (30) | Rest (30) | Rest (26) | Rest (26) |
aImportance (%) and rank change is shown in italics.
bHLAMM: HLA antigen mismatch.
cESRDDXSIMP: simplified end-stage renal disease diagnosis.
dRAGETX: recipient age.
eVINTAGE: number of years on dialysis before transplant.
fRDM2: recipient diabetes status.
gDRCMV: donor-recipient cytomegalovirus.
hFUNCTSTAT: functional status of the recipient.
iDRRACE: donor-recipient race.
jDAGE: donor age.
kECD: expanded criteria donor.
lRCAD: recipient coronary artery disease.
mPKPRA: peak panel reactive antibody.
nPREEMPTIVE: pre-emptive transplant.
oRWT2: recipient weight.
Below, we analyze the importance of features in each cohort and show the influence of features over time using nonoverlapping cohorts.
According to the top features shown in
Both HLAMMs and VINTAGE remained highly important in cohort 2. In addition, ESRDDXSIMP was noted as a highly important feature. Interestingly, we note that few features, such as donor age and recipient age, were rather insignificant in cohort 1 but were noted to be significant in both cohort 2 and further in cohort 3.
ESRDDXSIMP showed a relative downward trend; however, it remained a highly significant feature. Unlike earlier cohorts, HLAMMs and VINTAGE were noted to not maintain their importance in the long term, whereas the recipient’s status of diabetes was noted to be the most important feature, along with recipient age and their functional status. Donor age was noted to maintain a medium importance score between 5% and 10%.
Changing relevance of top 25 features over the three cohorts. AHD1: donor-recipient height difference; CIT: cold ischemia time; DAGE: donor age; DHT100: donor height; DHTN2: donor hypertension; DONCREAT: donor creatinine level; DRACESIMP: donor race; DRCMV: donor-recipient cytomegalovirus; DRRACE: donor-recipient race; DRWT: donor-recipient weight difference; DWT: donor weight; ECD: expanded criteria donor; ESRDDXSIMP: simplified end-stage renal disease diagnosis; FUNCTSTAT: functional status of the recipient; HLAMM: number of HLA mismatches; PKPRA: peak panel reactive antibody; PREEMPTIVE: preemptive transplant; RAGETX: recipient age; RCAD: recipient coronary artery disease; RDM2: recipient diabetes; RHT2100: recipient height; RHTN: recipient hypertension; RRACESIMP: recipient race; RWT2: recipient weight; VINTAGE: number of years on dialysis before transplantation.
Analysis of the values of categorical features provided novel insights into the influence of a feature.
Changing relevance of top 25 features (including dummy features) over the three cohorts. CIT: cold ischemia time; DAGE: donor age; DBMI: donor BMI; DHT100: donor height; DONCREAT: donor creatinine level; DRCMV_2: Donor positive recipient positive; DRRACE_1: Donor white recipient white; DWT: donor weight; ECD_0: Expanded criteria donor: no; ECD_1: Expanded criteria donor: yes; ESRDDXSIMP_2: End stage renal disease: diabetes mellitus; ESRDDXSIMP_3: End stage renal disease: polycystic kidney disease; ESRDDXSIMP_4: End stage renal disease: hypertension; FUNCTSTAT_1: Functional status of recipient: 100% no complaints; HLAMM _5: Number of human leukocyte antigen mismatches: 5; PKPRA: peak panel reactive antibody; PREEMPTIVE_1: Preemptive transplant: yes; PREEMPTIVE_2: Preemptive transplant: no; RAGETX: recipient age; RBMI2: recipient BMI; RDM2_0: Recipient diabetes: no; RDM2_1: Recipient diabetes: yes; RHT2100: recipient height; RWT2: recipient weight; VINTAGE: number of years on dialysis before transplantation.
The cross-cohort prediction results (
We compared the prediction performance of our ML-based prediction models with comparable organ transplant studies that involved similar-sized observations and temporal windows.
Prediction scores of similar studies.
Time | Study | Model | Size | Data set | Metric | Score (%) | Our score (%) |
1 year | Lin et al [ |
ANNa and LRb | 46,414 | UNOSc | AUCd | 73 | 82 |
1 year | Dag et al [ |
LR | 15,580 | UNOS | AUC | 63 | 82 |
5 years | Tiong et al [ |
Nomogram | 20,085 | UNOS | C-indexe | 71 | 69 |
5 years | Lin et al [ |
ANN | 17,856 | UNOS | AUC | 77 | 69 |
7 years | Lin et al [ |
ANN | 10,250 | UNOS | AUC | 82 | 81 |
14 years | Luck et al [ |
ANN | 46,098 | SRTRf | C-index | 65 | 81 |
aANN: artificial neural network.
bLR: logistic regression.
cUNOS: United Network of Organ Sharing.
dAUC: area under the curve.
eC-index: concordance index.
fSRTR: Scientific Registry of Transplant Recipients.
When comparing our results with prior studies, it is noted that although our cohort 2 prediction performance (ie, graft status prediction over a 5-year period) is lower than that of Lin et al [
A limitation of our research lies in the removal of censored instances. We removed all successful cases that were censored before 8 years following transplant. Although this type of approach has previously been used, including censored cases is a potential consideration for future analyses.
Understanding the impact of donor and recipient factors that predict short- and long-term kidney transplant allograft survival is important for patients and providers. Kidney transplantation is the optimal form of kidney replacement therapy, but kidney allografts are a limited resource. In addition, the alternative to kidney transplantation (ie, dialysis) is considerably costlier.
In this study, we present an ML-based framework to predict the status of kidney allografts, based on donor-recipient features, over a period of 17 years. We applied ML-based data analysis methods for feature engineering to reduce data dimensionality, develop prediction models for three distinct temporal cohorts, and investigate the changing relevance of clinical features across different temporal cohorts. We introduced the concept of nonoverlapped cohorts to analyze the changing relevance of features in three defined periods. In conclusion, our results emphasize that ML can be effective in predicting graft survival using donor and recipient factors that are routinely collected as part of patient care. As a next step, we plan to incorporate the prediction models into clinical care at the time of allocation; models that best predict short- and long-term kidney graft survival may be used as a pragmatic prognostic tool to aid clinicians in maximizing the best possible matching of donors and recipients while preserving existing allocation rules that are used to promote equity [
Supplementary tables.
adaptive boosting
artificial neural network
area under the curve
cytomegalovirus
end-stage renal disease diagnosis
logistic regression
machine learning
random forest
Synthetic Minority Oversampling Technique
Scientific Registry of Transplant Recipients
support vector machine
number of years on dialysis before transplant
The data reported here have been supplied by the Hennepin Healthcare Research Institute as the contractor for the SRTR. The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy of or interpretation by the SRTR or the US Government.
SAAN and SSRA were responsible for the overall data analysis methodology, data analysis using ML algorithms, evaluation of the data analysis results, and writing of the manuscript. KT and AV provided clinical expertise in defining the problem, interpretation of the data, preprocessing of the data and interpretation of the results, providing the data from the data source, and editing the manuscript for clinical clarity and purpose. PCR facilitated the setting up and conducting of data analysis experiments. All authors critically reviewed the manuscript for scientific content and approved the final manuscript for publication.
None declared.