Background

JMIR

J Med Internet Res

Journal of Medical Internet Research

1438-8871

JMIR Publications

Toronto, Canada

v24i8e37486

35921141

10.2196/37486

Original Paper

Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study

Eysenbach

Gunther

Nuutinen

Mikko

Jorm

Louisa

Qian

Buyue

Huang

Yanqun

BS 1 2

https://orcid.org/0000-0001-8111-6352

Zheng

Zhimin

BS 1 2

https://orcid.org/0000-0002-6656-7195

Moxuan

MS 1 2

https://orcid.org/0000-0002-9941-5569

Xin

MS 1 2

https://orcid.org/0000-0002-0315-3518

Liu

Honglei

PhD 1 2

https://orcid.org/0000-0001-5518-4749

Fei

Xiaolu

PhD 3

https://orcid.org/0000-0001-7498-0249

Wei

Lan

PhD 3

https://orcid.org/0000-0003-1128-1314

Chen

Hui

PhD 1

School of Biomedical Engineering Capital Medical University

No. 10, Xitoutiao, You An Men, Fengtai District

Beijing, 100069

China 86 01083911545 chenhui@ccmu.edu.cn

https://orcid.org/0000-0002-0048-0193

1 School of Biomedical Engineering Capital Medical University

Beijing

China 2 Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application Capital Medical University

Beijing

China 3 Information Center Xuanwu Hospital Capital Medical University

Beijing

China

Corresponding Author: Hui Chen chenhui@ccmu.edu.cn

8 2022

3 8 2022

24 8

e37486

22 2 2022 20 4 2022 2 6 2022 18 7 2022

©Yanqun Huang, Zhimin Zheng, Moxuan Ma, Xin Xin, Honglei Liu, Xiaolu Fei, Lan Wei, Hui Chen. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.08.2022.

2022

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Background

The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention.

Objective

We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI).

Methods

Medical concepts, including patients’ age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score.

Results

Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice.

Conclusions

The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation.

representation learning skip-gram feature association strengths feature importance mortality risk prediction acute myocardial infarction

Introduction

Electronic medical records (EMRs) contain diverse and heterogeneous information, such as demographic data, disease diagnoses, laboratory tests, radiological findings, examinations and procedures, and medications. EMR data can be used to not only reflect the health status of patients and record the treatment trajectory, but also help doctors in making clinical decisions [1-6] and improving the efficiency of diagnosis and treatment [1,7,8]. One of the most prevalent and practical tasks of the secondary use of EMR data is building models to predict the disease status [8-10] and treatment outcomes [11-17] for a patient, using machine learning algorithms.

However, the high dimensionality, sparsity, and heterogeneity of EMR data [12,18] pose many obstacles for directly inputting the raw data into machine learning–based predictive models. Some manual and data-driven feature engineering methods [15,19], though time-consuming and laborious, were used to select important features or extract useful information for predictive tasks. Moreover, the performance of predictive models relies heavily on the representation of data. It was reported that effective representation methods could make the downstream modeling simpler and more flexible, and greatly improve the predictive performance [18,20]. By transforming raw features into compact vectors, representation learning can make it easier to automatically extract useful information when building predictive models [16,21,22]. One widely used representation method for EMR data is the skip-gram algorithm [23], a distributed embedding method that treats patient records as sentences and medical concepts as words. An inevitable problem in the skip-gram algorithm is that contrary to words within a sentence, medical concepts in a patient’s record do not have a natural order, making it difficult to learn meaningful representations of concepts that have potential associations. One solution for this problem was randomly shuffling the concepts within a record to learn concept embeddings [12,24-26]. It could reduce the impact of the disorder attribute of medical concepts on the algorithm to some degree, while associations among these concepts were still not taken into consideration.

Acute myocardial infarction (AMI) is an acute ischemic heart disease and is the second leading cause of death. One in every 6 deaths is caused by ischemic heart disease, where AMI accounts for the majority of deaths [27,28]. Mortality risk prediction for AMI patients plays a crucial role in clinical work, helping doctors identify potential clinical factors, take early intervention measures based on timely alerts of patients’ adverse health statuses, and reduce the burdensome expenditure of related health care expenses. Therefore, researchers [19,29-31] have focused on building machine learning models for the outcome prediction of AMI patients, and most of them used specific clinical features, such as laboratory test results (eg, albumin), comorbidities (eg, diabetes), and demographic data (eg, gender).

In this study, we aimed to represent various structured features extracted from EMR data as fixed-length embedding vectors, which were then used to improve the performance of predictive models for the death risk of AMI patients. Specifically, we introduced the association strengths into the skip-gram algorithm to learn more informative representations of features. We also introduced the Shapley additive explanations (SHAP) [32] technique to facilitate representation at the patient level and enhance the interpretability of the predictive model. An overview of our proposed representation learning framework and its application is shown in Figure 1.

Figure 1

Overview of the proposed representation learning method for patients’ mortality risk prediction. First, feature representations were learned by the skip-gram algorithm using an adaptive context window. Then, patient representations were constructed based on feature representations weighted by the feature importance. Finally, the proposed patient representation was applied in the mortality risk prediction for acute myocardial infarction in-patients from a public data set and a private data set, and compared with reference methods.

Methods Skip-Gram–Based Patient Representation

The representation was learned hierarchically at the following 3 levels: the concept, feature, and patient levels. At the concept level, we employed the improved skip-gram algorithm [23] to represent a concept as an embedding vector. In the natural language processing domain, the basic idea of skip-gram was to maximize the occurrence probabilities of the target words and the context words in the predefined context window, making the words that appear in the same context window closer in the embedding space. Unlike words with natural orders in a sentence, medical concepts appeared out of order in a patient record for a certain hospital stay. This made it difficult to determine the context window that contained relevant concepts for the target concept, especially when the number of concepts in a record was far larger than the size of the context window. Thus, for a concept within a record, we identified relevant concepts using its association strength with a candidate concept in the same record. The association strength was defined as the confidence (equation 1) of an association rule with one candidate concept as the unique antecedent (or consequent).

Confidence (C1, C2) = |C1∩C2| / |C1| (1)

where C1 and C2 are the antecedent and consequent concepts, respectively, of an association rule C1→C2, and |C1| and |C1∩C2| are the numbers of patient records containing C1 and both C1 and C2, respectively. The greater the confidence, the stronger the association between the 2 concepts. Antecedent (or consequent) concepts in association rules with the top N highest confidences were included in the context window of the target concept. We called these selection schemes of context concepts antecedent-based (or consequent-based) embeddings. Figure 2 provides an example of the consequent-based selection scheme of context concepts.

Figure 2

An illustration of context concept selection for the skip-gram algorithm using association strengths. All records are composed of 10 concepts (C1, C2, ……, and C10). In the confidence matrix, element Cij was the confidence of the association rule with Cj as antecedent and Ci as consequent. For patient 1 with 6 concepts (C1, C3, C6, C7, C8, and C10), the included concepts in C1’s 4-concept context window were selected from the remining 5 candidate concepts, whose confidences were 0.66 (antecedent, C10), 0.62 (C3), 0.55 (C6), 0.53 (C8), and 0.46 (C7). Therefore, C10, C3, C6, and C8 were selected to construct the context window for C1.

Moreover, to reduce the high dimensionality and sparsity of a large number of concepts, while preserving the clinical information as much as possible, we aggregated the concepts of disease diagnoses and procedures into several clinically meaningful feature groups according to International Classification of Diseases, 10th revision (ICD-10) codes and International Classification of Diseases, 9th revision (ICD-9) codes, and with the help of clinical experts. For example, disease diagnoses of type 1 diabetes mellitus and type 2 diabetes mellitus were grouped into the feature group of diabetes mellitus. The average of all embedding vectors of concepts from the same feature group in a patient record was treated as the representation at the feature level for the patient.

The representation at the patient level was the weighted sum of feature-level representations. The feature weights were obtained under the guidance of the predictive task, indicating the importance of each feature involved in the patient representation. In this study, we used SHAP values as the feature weights. The SHAP framework is a machine learning interpretation technique based on the idea of game theory. It approximated a trained prediction model with a different but simple model that could easily calculate the contribution in the form of a SHAP value for each feature in the prediction model and performed additive feature attribution to explain the combination of features [32]. A positive or negative SHAP value reflected a positive or negative influence on the prediction. A feature’s importance was then computed as the average of its absolute SHAP values from all samples.

Experiments and Evaluations Data Sets and Data Preprocessing

In this study, we used a public data set, the freely accessible critical care database Medical Information Mart for Intensive Care III (MIMIC-III data set [33]), and a private data set for the experiments.

The MIMIC-III data set was collected between June 2001 and October 2012, and involved 46,520 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center in Boston, Massachusetts. It includes patient health information, such as demographics, vital signs, laboratory test results, medications, procedures, diagnosis codes, and clinical notes. The informative MIMIC-III data set was widely used in some medical machine learning modeling and algorithm evaluations, providing strong data support for researchers to establish models and evaluate algorithms [14,18].

The private data set was derived from the EMR system of a tertiary hospital, Xuanwu Hospital, Capital Medical University, Beijing, China, between January 2014 and December 2016. Patient features included hospital admission and discharge information, demographic data, disease diagnoses, laboratory tests, examinations and procedures, medications, and radiology reports of chest X-ray or color sonography examination.

We extracted the records of all 3010 and 1671 AMI patients from the public and private data sets, respectively. The diagnosis of AMI was confirmed with the ICD-9 codes 410.01 to 410.91 or ICD-10 codes I21 and I22. There were 254 (8.1%) and 103 (6.2%) patients who died in the hospital from the public and private data sets, respectively.

We maintained patients’ first hospitalization data to evaluate the proposed method. Demographic data (age and gender) and the following AMI-related features were maintained in both data sets: AMI-relevant items of laboratory tests that at least 95% of patients carried out, AMI-relevant radiological features extracted from radiology reports [34], 7 commonly prescribed medications, and all recorded disease diagnoses and procedures. For laboratory tests performed more than once, only the results obtained in the first test (usually at admission) were retained, which could reflect a patient’s health status and the severity of illness.

Since initially proposed in the field of natural language processing, the skip-gram algorithm was used to train embeddings for discrete words or symbols. Therefore, to use the skip-gram algorithm for the embedding representation of the structured data, all patient features should be categorical, where each discrete value is treated as a concept. For example, male and female were 2 concepts for gender. Different from raw categorical features, such as gender, disease diagnoses, procedures, and medications, that might remain unchanged, the continuous variables age and laboratory test results had to be discretized into two or more concepts. Age was discretized into 2 concepts (>60 years and ≤60 years). Each laboratory test result was also discretized into 2 concepts (normal and abnormal with reference to clinical standards). In total, 3326 and 1073 medical concepts were identified and further aggregated to 104 and 108 feature groups in the public and private data sets, respectively (Table 1). All feature groups of the private and public data sets are listed in Multimedia Appendix 1 and Multimedia Appendix 2, respectively.

Table 1

Concepts and feature groups of both the public and private data sets.

Feature category	Public data set			Private data set			Concept examples
	Feature groups (n=104), n	Concepts (n=3326), n	Feature groups (n=108), n		Concepts (n=1073), n
Age	1	2	1		2	>60 years and ≤60 years
Gender	1	2	1		2	Male and female
Laboratory tests	19	38	40		80	Abnormal serum triglyceride and normal serum creatinine
Radiological features	34	34	36		36	Cardiac image enlargement and sharp costophrenic angle
Disease diagnoses	24	2600	15		739	Hypertension and brainstem infarction
Procedures	18	643	8		207	Coronary stenting and pericardiocentesis
Medications	7	7	7		7	Angiotensin-converting enzyme inhibitor and heparin

Representation Evaluation

To evaluate the effectiveness and advantages of the proposed representation, we used 2 additional kinds of simple reference representation methods, namely, the 3-layer autoencoder with learning and the feature selection method without learning. Table 2 describes the details of the proposed and reference representation methods.

The proposed representation method was first evaluated at the concept level. Cluster analyses were used to cluster laboratory test concepts into 2 clusters for the quantitative evaluation. The adjusted Rand index (ARI) [35] (ranging from −1 to 1) was used to evaluate the cluster solutions. Greater ARI values indicated higher ability of discriminating from categories with different real labels (normal and abnormal). We also applied the t-distributed stochastic neighbor algorithm to project the embedding vectors of laboratory test concepts into a 2-dimensional space to visually observe the distribution of embeddings.

The proposed representation method was then evaluated at the patient level with a downstream prediction task using the logistic regression model. The predicted outcome was the in-hospital death of AMI patients during hospital stay. The input for prediction was the patient representation derived from the entire feature set listed in Table 1. We also extracted a treatment-free feature subset that excluded medications and procedures from the entire feature set, trying to clarify that the performance of the proposed patient representation was related to the features that were involved in the representation and that the treatment-related features played a crucial role in predicting patient outcome even if they had been represented as embedding vectors.

Table 2

Descriptions of the proposed and reference representation methods.

Representation method	Descriptions	Representation examples
Mixture	The mixture of discretization codes for original discrete features and original values for continuous features. The missing values in the laboratory tests were interpolated using the mean of the corresponding laboratory tests.	(0,1,1,0,0,0,1,12,8.5,3,8) for a patient with 11 features
Discretization	The 0-1 vector where the digit 1 represented the patient having the specific disease, procedure, radiological feature, and medication, and 0 otherwise. Age of 1 meant >60 years and 0 meant ≤60 years, gender of 1 meant male and 0 meant female, and a laboratory test item of 1 meant abnormal and 0 meant normal. Missing values for laboratory tests were interpolated by the corresponding mode.	(0,1,1,0,0,0,1,1,0,1,1) for a patient with 11 discretization features
DIS_FS^a	The selected features with discretization representations were statistically different between patients with and without the label “death.”	(0,0,1,0,0,1,0,1) for a patient with 8 selected features
DIS_AE^b	The hidden-layer vector of a 3-layer autoencoder with discretization vectors as inputs and outputs. The dimension of the hidden layer was set to 64.	(0.7,1.9,0.5,−1,−3.1,2.4) for a patient with a 6-dimensional vector
RAN_EM_AVE^c	The average of feature embedding vectors learned from the skip-gram algorithm using the random selection method to determine the context window.	(1.6,−0.5,1.1,0.1,−1.3,0.6) for a patient with a 6-dimensional embedding vector
RAN_EM_WGT^d	The weighted sum of the feature embedding vectors learned from the skip-gram algorithm using the random selection method to determine the context window.	(1.2,−0.9,1.3,0.4,−1.9,1.0) for a patient with a 6-dimensional embedding vector
ANT_EM_AVE^e	The average of the feature embedding vectors learned from the skip-gram algorithm using the confidence with the target concept as the antecedent.	(0.9,−0.6,1.2,1.4,−1.9,0.6) for a patient with a 6-dimensional embedding vector
ANT_EM_WGT^f	The weighted sum of the feature embedding vectors learned from the skip-gram algorithm using the confidence with the target concept as the antecedent.	(1.2,−1.5,1.1,0.1,−0.6,0.6) for a patient with a 6-dimensional embedding vector
CON_EM_AVE^g	The average of the feature embedding vectors learned from the skip-gram algorithm using the confidence with the target concept as the consequent.	(1.6,−0.8,2.1,1.6,−1.4,1.5) for a patient with a 6-dimensional embedding vector
CON_EM_WGT^h	The weighted sum of the feature embedding vectors learned from the skip-gram algorithm using the confidence with the target concept as the consequent.	(1.1,−0.4,−0.7,1.6,−0.3,0.9) for a patient with a 6-dimensional embedding vector

^aDIS_FS: discretization representations with feature selection.

^bDIS_AE: hidden vector of an autoencoder-based representation.

^cRAN_EM_AVE: average of the random selection–based embedding representation.

^dRAN_EM_WGT: weighted sum of the random selection–based embedding representation.

^eANT_EM_AVE: average of the antecedent-based embedding representation.

^fANT_EM_WGT: weighted sum of the antecedent-based embedding representation.

^gCON_EM_AVE: average of the consequent-based embedding representation.

^hCON_EM_WGT: weighted sum of the consequent-based embedding representation.

We randomly split samples into training and test data sets by the ratio of 7:3. The training samples were first represented in the discretization vectors and used to build a predictive model for calculating all features’ SHAP values for the further patient embedding representations of all study samples. After being represented as embedding vectors, the training and test samples were used to build and validate a logistic regression-based predictive model, respectively. The area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score were the main evaluation metrics. Other relevant performance metrics from the confusion matrix included precision, recall, and accuracy. To eliminate the performance bias introduced by the initialization of a skip-gram model and the training/test data set split, we performed the comparative experiment 100 times. In each experiment round, the above processes were repeated. The mean with its 95% CI of each performance evaluation metric was reported.

In the skip-gram algorithm, the size of the context window and the dimension of the embedding vector were determined by trial and error. We conducted a group of predictive experiments on the public data set, using possible combinations of window sizes of 5, 10, 15, and 20, and vector dimensions of 50, 100, 200, and 300. Experimental results (listed in Multimedia Appendix 3) showed that the skip-gram algorithm with the combination of a window size of 10 and a vector dimension of 300 had the highest representation performance. Therefore, the size of the context window and the dimension of the embedding vector were set to 10 and 300, respectively. We applied the negative sampling mechanism (20 negative samples in this study) to accelerate the concept embedding training process. Other parameters were as follows: learning rate, 0.001; number of iterations, 50; batch size, 64. The gradient calculation method was Adam. We implemented representation learning, SHAP value computation, and prediction modeling in Python 3.7 and TensorFlow 2.0. In the step of patient representation, we used the L2 regularization penalty with “liblinear” solver for the logistic regression model, and the inverse of regularization strength was set to 0.1.

Ethics Approval

The study was approved by the Human Research Ethics Committees of Xuanwu Hospital, Capital Medical University (approval number: Clinical Scientific Research 2020-070).

Results Concept Representation Evaluation

Embedding vectors for laboratory test concepts were visualized in a plane space (Figure 3). Concepts of normal and abnormal laboratory tests (Figure 3) were farther away when they were represented by the consequent-based embeddings (Figures 3A and 3D) than by the antecedent-based embeddings (Figures 3B and 3E) and the random selection–based embeddings (Figures 3C and 3F). In cluster analyses for laboratory tests, the consequent-based embeddings achieved higher ARIs (0.317 and 0.520 on the public and private data sets, respectively) than the antecedent-based embeddings (0.112 and 0.149, respectively) and the random selection–based embeddings (0.043 and 0.028, respectively). The best cluster performance of the consequent-based embeddings among the 3 embeddings indicated that the consequent-based embeddings might contain more feature association information.

Figure 3

Visualization of the embedding laboratory tests using different selection schemes for contextual concepts in the skip-gram algorithm (the t-distributed stochastic neighbor embedding algorithm was used). Dots in red and green represent abnormal and normal laboratory test results, respectively. A to C for the public data set: the contextual concepts of a target concept consist of its consequent concepts (A) or antecedent concepts (B) in association rules, or randomly selected concepts (C). D to F are the counterparts of A to C on the private data set.

Predictive Performance

Table 3 and Multimedia Appendix 4 list the predictive performances using various representation methods on the private and public data sets, respectively. The proposed representation method, the weighted sum of the consequent-based embedding representation (CON_EM_WGT), showed the highest predictive performances, with maximum AUROCs of 0.878, 0.973, and 0.926 using all features of the public data set and the entire and treatment-free feature sets of the private data set, respectively. When the performance was measured by AUPRC and F1-score, the proposed representation method outperformed all the other methods regardless of the data sets and feature sets.

Compared with the reference representations, most of the embedding-based representations on both data sets showed a performance improvement. The average AUROC, AUPRC, and F1-score of the 6 representation methods with embeddings were greater than those of the 4 reference methods without embeddings (0.855 vs 0.831, 0.203 vs 0.185, and 0.354 vs 0.328, respectively) on the public data set with the entire feature set. Further, among the 6 representations based on the skip-gram algorithm, representations with algorithm improvement based on the association strength achieved superior performance than those without.

When assembling feature representations into a patient representation, the assembling method and the involved features did matter. Representations based on the idea of weighted sum outperformed those based on the idea of average, on either the public data set with the entire feature set (AUROC, 0.863 to 0.878 vs 0.834 to 0.850) or the private data set with the entire feature set (0.967 to 0.973 vs 0.948 to 0.957). On the other hand, consistently superior predictive performance was achieved on both data sets with the entire feature set compared with the treatment-free feature set. Multimedia Appendix 5 shows the average predictive performance of patient representation methods on the public and private data sets with and without treatment feature sets.

Table 3

Predictive performance of patient representation methods on the private data set.

Feature set and representation methods				AUROC^a, mean (95% CI)		AUPRC^b, mean (95% CI)		F1-score, mean (95% CI)
Entire feature set
	Embedding-based representation methods
		CON_EM_WGT^c	0.973 (0.951-0.995)		0.505 (0.278-0.732)		0.674 (0.468-0.880)
		CON_EM_AVE^d	0.957 (0.933-0.981)		0.312 (0.159-0.465)		0.479 (0.301-0.657)
		ANT_EM_WGT^e	0.972 (0.948-0.996)		0.489 (0.258-0.720)		0.658 (0.442-0.874)
		ANT_EM_AVE^f	0.953 (0.929-0.977)		0.310 (0.185-0.435)		0.478 (0.329-0.627)
		RAN_EM_WGT^g	0.967 (0.942-0.992)		0.486 (0.263-0.709)		0.660 (0.460-0.860)
		RAN_EM_AVE^h	0.948 (0.923-0.973)		0.287 (0.167-0.407)		0.451 (0.306-0.596)
	Reference representation methods
		DIS_AEⁱ	0.884 (0.845-0.923)		0.207 (0.144-0.270)		0.361 (0.279-0.443)
		DIS_FS^j	0.938 (0.907-0.969)		0.283 (0.167-0.399)		0.452 (0.309-0.595)
		Discretization	0.939 (0.908-0.970)		0.283 (0.165-0.401)		0.454 (0.307-0.601)
		Mixture	0.904 (0.849-0.959)		0.251 (0.135-0.367)		0.417 (0.264-0.570)
Treatment-free feature set
	Embedding-based representation methods
		CON_EM_WGT	0.926 (0.883-0.969)		0.282 (0.139-0.425)		0.456 (0.282-0.630)
		CON_EM_AVE	0.915 (0.876-0.954)		0.248 (0.156-0.340)		0.413 (0.297-0.529)
		ANT_EM_WGT	0.919 (0.874-0.964)		0.278 (0.133-0.423)		0.455 (0.275-0.635)
		ANT_EM_AVE	0.912 (0.869-0.955)		0.256 (0.162-0.350)		0.423 (0.307-0.539)
		RAN_EM_WGT	0.915 (0.868-0.962)		0.248 (0.119-0.377)		0.416 (0.238-0.594)
		RAN_EM_AVE	0.897 (0.850-0.944)		0.225 (0.133-0.317)		0.385 (0.265-0.505)
	Reference representation methods
		DIS_AE	0.884 (0.845-0.923)		0.207 (0.144-0.270)		0.361 (0.279-0.443)
		DIS_FS	0.903 (0.862-0.944)		0.214 (0.124-0.304)		0.367 (0.236-0.498)
		Discretization	0.905 (0.862-0.948)		0.224 (0.122-0.326)		0.381 (0.238-0.524)
		Mixture	0.867 (0.806-0.928)		0.202 (0.116-0.288)		0.356 (0.227-0.485)

^aAUROC: area under the receiver operating characteristic curve.

^bAUPRC: area under the precision-recall curve.

^cCON_EM_WGT: weighted sum of the consequent-based embedding representation.

^dCON_EM_AVE: average of the consequent-based embedding representation.

^eANT_EM_WGT: weighted sum of the antecedent-based embedding representation.

^fANT_EM_AVE: average of the antecedent-based embedding representation.

^gRAN_EM_WGT: weighted sum of the random selection–based embedding representation.

^hRAN_EM_AVE: average of the random selection–based embedding representation.

ⁱDIS_AE: discretization representations with features selection.

^jDIS_FS: hidden vector of an autoencoder-based representation.

Predictive Model Interpretation

Figure 4 illustrates the global feature attributions for the top 20 most important features from the private data set when predicting in-hospital death risk. The treatment-related features played an important role in the mortality prediction. These features included other surgery (mean absolute SHAP value: 0.413), diagnostic ultrasound (0.279), contrast agent cardiovascular angiography (0.197), etc (Figure 4A). Moreover, comorbidity diseases like hypertension (mean absolute SHAP value: 0.252) and heart disease complications (0.236), and laboratory tests like serum glucose (0.188) and serum lactate dehydrogenase (0.139) had strong associations with in-hospital death (Figure 4B). SHAP values of features in the public data set are shown in Multimedia Appendix 6.

In addition to the feature’s global importance in the specific predictive task, SHAP values were helpful in distinguishing the feature’s local importance, that is, the importance for an individual sample. Figure 5 illustrates how the mortality risk was predicted with SHAP values for a patient who died during hospital stay and another patient who did not die. The positive SHAP values of most features of the patient who died during hospital stay increased the total SHAP value from an average value of −3.739 to a final value of −0.499 (Figures 5A and 5C), meaning that the patient had a higher risk of in-hospital death than the average. In this incremental process, gender as female, for example, contributed a SHAP value of +0.21 (Figures 5C). On the contrary, the negative SHAP values of most features of the patient who was discharged alive decreased the total SHAP value from −3.739 to −6.169 (Figures 5B and 5D), indicating a lower death risk. In this decremental process, male gender contributed a SHAP value of −0.09 (Figures 5D). We have shown 2 examples of patients from the public data set in Multimedia Appendix 7.

Figure 4

The mean absolute Shapley additive explanations (SHAP) values of the top 20 features of the private data set within the entire feature set (A) and the treatment-free feature set (B).

Figure 5

Shapley additive explanations (SHAP) values for a patient who died during hospital stay (A and C) and another patient who did not die (B and D). Both patients were selected from the private data set with the entire feature set. A and B, all features with their SHAP values. C and D, 20 features with the greatest absolute SHAP values. Features in blue tend to reduce the possibility of a patient being classified as positive (death in this study), while features in red do the contrary. The meaning of each abbreviated feature name can be found in Multimedia Appendix 1.

Discussion Principal Findings

With the widespread adoption of EMR data in building machine learning–based predictive models, one of the most fundamental research challenges was learning proper patient representations that might capture hidden semantic associations among medical concepts [18]. In this study, we proposed an improved skip-gram–based patient representation method where the association strength among medical concepts and the task-specific feature importance were integrated. Compared with other representation methods, the proposed patient representation improved the performance of the mortality risk prediction for AMI patients.

In previous studies, deep learning models [9,10,12,25,36] were used in training embedding representations of medical concepts for the subsequent patient representation. When using the skip-gram algorithm, the order of medical concepts that was independent of feature relevance hindered the algorithm from learning high-quality representation. Prior work recommended the shuffling mechanism for medical concepts in a patient record to reduce the impact of the out-of-order characteristics on the algorithm [12,24-26]. In this study, we introduced the association strength between 2 concepts that was defined as the confidence of an association rule involving just the 2 concepts. Experiments from various aspects indicated that this ingenious improvement was effective in revealing potential associations among medical concepts and further enhancing the performance of downstream prediction tasks.

In addition to the representation algorithm, features used to represent a patient were also critical. Many previous studies focused on some features in the original form of medical codes, such as disease diagnoses, procedures, and medications [1,11,14,37]. For laboratory tests that contained much diagnosis and prognosis-relevant information about patients, we included the normal status of the laboratory tests into the feature sets, rather than simply using the number of laboratory tests and test co-occurrences [12,38]. We further extracted radiological features from free-text radiological reports. Admittedly, richer features may lead to a feature representation with more information, even if the dimension of patient representation remains unchanged. In this study, predictive models using more features to represent a patient did reflect more information about the patient and showed higher performance than those using fewer features. Our finding is similar to the results of other studies [39,40].

Prior studies employed neural networks to train predictive models for clinical outcomes using EMR data [2,16,22]. They focused on end-to-end prediction models built on large data sets, where the last hidden layer of the neural network was regarded as the patient representation. Although the deep end-to-end neural network–based patient representation improved the predictive accuracy, the lack of interpretability could not be ignored. Some studies [26,41] constructed patient representations using the average of concept representations learned by word embedding methods, which did not make full use of the importance of different clinical features for patients. As an advanced interpretability method, the SHAP value [32] was successfully used to analyze and explain the predictive models in some previous studies [40,42-44]. We introduced SHAP values as feature importance into the patient representation, and further explained the predictive model with SHAP values. SHAP values can be used to not only rank the overall importance and identify the important factors for the prediction task, but also explore the key factors for predicting the mortality risk for a specific patient. In our predictive task for AMI patients, the most important features identified by SHAP values were really closely related to AMI [45-47], such as serum glucose and serum creatine kinase, which are 2 critical laboratory tests for AMI diagnosis and prognosis in clinical practice.

In our predictive task, the model that took all available patient characteristics represented by the proposed patient representation method as inputs showed a higher performance than other models on the same task in previous studies (AUROC, 0.973 vs 0.905 to 0.935 [19,29-31,48]). This may be because the embedding representation contained a large number of diverse features extracted from a general EMR system, while many researchers selected AMI-related features with the assistance of clinical experts. For example, basic demographic data and few laboratory tests, as well as several specific features of AMI like Killip classification and left ventricular ejection fraction [19,30] were directly added into the machine learning model to predict mortality risk. Further, compared with other simple feature extraction methods like Principal Component Analysis [29] and the 3-layer autoencoder model, the proposed method took the association strength and feature importance into consideration, achieving higher predictive performance.

Limitations

This study had some limitations. First, only patients’ laboratory tests for the first time during hospital stay were included in this study, while many patients took two or more laboratory tests. Since temporal data, especially multiple laboratory tests, may reflect the dynamic health status and the treatment effect of a patient over time, the lack of temporal characteristics of laboratory tests in the patient representation may lead to performance loss in downstream tasks. A future study will focus on integrating this uneven and irregular temporal data into the current patient representation. Second, the skip-gram algorithm was used in training concept embeddings. The algorithm is popular in the natural language processing domain, possibly having a limited ability to represent structured and disordered EMR data. A transformer-based pretrain model, Med-Bert, has been trained to represent disease diagnoses originally expressed in ICD-10 and ICD-9 codes, showing higher performance with AUROCs of 85.39% and 82.23% in heart failure and pancreatic cancer prediction tasks, respectively [49]. Therefore, more complicated deep learning methods will be adopted for a more informative patient representation in the future. Lastly, we carried out only internal validation of the predictive model built on the proposed patient representation. External validation of high quality will be more convincing and will help in continuous algorithm improvement. Moreover, the chosen reference methods for the performance comparison were simple feature selection methods and a 3-layer autoencoder. Comparison with state-of-the-art methods is needed to evaluate the performance and potential use of our proposed method.

Conclusions

In this study, we improved the embedding-based patient representation with the association strength of medical concepts and importance of patient features. After further training and fine-tuning, the model based on the proposed patient representation will hopefully be used to assist in prognostic prediction for AMI inpatients. This study puts forward a meaningful direction for the development of more effective and efficient clinical prediction models using EMR data. It is desirable for patient representation learning to serve as an essential part of building a predictive model for clinical outcomes.

Multimedia Appendix 1

Patient features of samples in the private data set.

Multimedia Appendix 2

Patient features of samples in the public MIMIC-III data set.

Multimedia Appendix 3

Predictive performance of skip-gram–based embedding representations with different combinations of the size of the context window and the dimension of the embedding vector.

Multimedia Appendix 4

Predictive performance of patient representation methods on the public data set.

Multimedia Appendix 5

The average predictive performance of patient representation methods on the public and private data sets with and without treatment feature sets.

Multimedia Appendix 6

The mean absolute Shapley additive explanations (SHAP) values of the top 20 features of the public data set within the entire feature set (A) and the treatment-free feature set (B).

Multimedia Appendix 7

Shapley additive explanations (SHAP) values for a patient who died during hospital stay (A and C) and another patient who did not die (B and D) from the public data set with the entire feature set.

Abbreviations

AMI

acute myocardial infarction

ARI

adjusted Rand index

AUPRC

area under the precision-recall curve

AUROC

area under the receiver operating characteristic curve

EMR

electronic medical record

ICD

International Classification of Diseases

SHAP

Shapley additive explanations

This work was supported by the National Natural Science Foundation of China (No. 81971707).

None declared.

Xiao

Wei

Zhou

Pan

Jing

Zhao

Yuan

Treatment initiation prediction by EHR mapped PPD tensor based convolutional neural networks boosting algorithm

J Biomed Inform 2021 08 120 103840

10.1016/j.jbi.2021.103840

34139331

S1532-0464(21)00169-6

Jiang

Huang

Long-term prediction for temporal propagation of seasonal influenza using Transformer-based model

J Biomed Inform 2021 10 122 103894

10.1016/j.jbi.2021.103894

34454080

S1532-0464(21)00223-9

Sheng

Hou

Yang

Cai

Zhang

A novel hierarchical machine learning model for hospital-acquired venous thromboembolism risk assessment among multiple-departments

J Biomed Inform 2021 10 122 103892

10.1016/j.jbi.2021.103892

34454079

S1532-0464(21)00221-5

Tang

Sow

Mei

Disease network delineates the disease progression profile of cardiovascular diseases

J Biomed Inform 2021 03 115 103686

10.1016/j.jbi.2021.103686

33493631

S1532-0464(21)00015-0

Chen

Dong

Kaymak

Huang

Deep representation learning for individualized treatment effect estimation using electronic health records

J Biomed Inform 2019 12 100 103303

10.1016/j.jbi.2019.103303

31610264

S1532-0464(19)30222-9

Dligach

Afshar

Miller

Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse

J Am Med Inform Assoc 2019 11 01 26 11 1272 1278

10.1093/jamia/ocz072

31233140

5522436

PMC6798566

Kamal

Yin

Qian

Zhang

An interpretable risk prediction model for healthcare with pattern attention

BMC Med Inform Decis Mak 2020 12 30 20 Suppl 11 307

10.1186/s12911-020-01331-7

33380322

10.1186/s12911-020-01331-7

PMC7772928

Zhang

Chou

Liang

Xiao

Zhao

Sarva

Henchcliffe

Wang

Data-driven subtyping of Parkinson's disease using longitudinal clinical records: A cohort study

Sci Rep 2019 01 28 9 1 797

10.1038/s41598-018-37545-z

30692568

10.1038/s41598-018-37545-z

PMC6349906

Luo

Gandhi

Zhang

Shao

Han

Chandrasekaran

Turzhitsky

Bali

Roberts

Metzger

Baker

La Rosa

Weaver

Dexter

Huang

Applying interpretable deep learning models to identify chronic cough patients using EHR data

Comput Methods Programs Biomed 2021 10 210 106395

10.1016/j.cmpb.2021.106395

34525412

S0169-2607(21)00469-7

Roberts

Jiang

Long

Distributed learning from multiple EHR databases: Contextual embedding models for medical events

J Biomed Inform 2019 04 92 103138

10.1016/j.jbi.2019.103138

30825539

S1532-0464(19)30056-5

PMC6533615

Barbieri

Kemp

Perez-Concha

Kotwal

Gallagher

Ritchie

Jorm

Benchmarking deep learning architectures for predicting readmission to the ICU and describing patients-at-risk

Sci Rep 2020 01 24 10 1 1111

10.1038/s41598-020-58053-z

31980704

10.1038/s41598-020-58053-z

PMC6981230

Steinberg

Jung

Fries

Corbin

Pfohl

Shah

Language models are an effective representation learning technique for electronic health record data

J Biomed Inform 2021 01 113 103637

10.1016/j.jbi.2020.103637

33290879

S1532-0464(20)30265-3

PMC7863633

Carrasco-Ribelles

Pardo-Mas

Tortajada

Sáez

Valdivieso

García-Gómez

Predicting morbidity by local similarities in multi-scale patient trajectories

J Biomed Inform 2021 08 120 103837

10.1016/j.jbi.2021.103837

34119690

S1532-0464(21)00166-0

Yang

Zheng

Chen

Multi-layer representation learning and its application to electronic health records

Neural Process Lett 2021 53 2 1417 1433

10.1007/s11063-021-10449-2

33623481

10449

PMC7891814

Wang

Zhou

Yin

Weight-based multiple empirical kernel learning with neighbor discriminant constraint for heart failure mortality prediction

J Biomed Inform 2020 01 101 103340

10.1016/j.jbi.2019.103340

31756495

S1532-0464(19)30259-X

Rongali

Rose

McManus

Bajracharya

Kapoor

Granillo

Learning latent space representations to predict patient outcomes: Model development and validation

J Med Internet Res 2020 03 23 22 3 e16374

10.2196/16374

32202503

v22i3e16374

PMC7136840

Tahayori

Chini-Foroush

Akhlaghi

Advanced natural language processing technique to predict patient disposition based on emergency triage notes

Emerg Med Australas 2021 33 3 480 484

10.1111/1742-6723.13656

33043570

Jiang

Miller

Wang

Jim Zheng

Roberts

Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review

J Biomed Inform 2021 03 115 103671

10.1016/j.jbi.2020.103671

33387683

S1532-0464(20)30299-9

Kwon

Jeon

Kim

Lim

Kim

Song

Park

Choi

Deep-learning-based risk stratification for mortality of patients with acute myocardial infarction

PLoS One 2019 14 10 e0224502

10.1371/journal.pone.0224502

31671144

PONE-D-19-14012

PMC6822714

Ruan

Lei

Zhou

Zhai

Zhang

Gao

Representation learning for clinical time series prediction tasks in electronic health records

BMC Med Inform Decis Mak 2019 12 17 19 Suppl 8 259

10.1186/s12911-019-0985-7

31842854

10.1186/s12911-019-0985-7

PMC6916209

Zhao

Papapetrou

Asker

Boström

Learning from heterogeneous temporal data in electronic health records

J Biomed Inform 2017 01 65 105 119

10.1016/j.jbi.2016.11.006

27919732

S1532-0464(16)30165-4

Morid

Sheng

ORL

Kawamoto

Abdelrahman

Learning hidden patterns from patient multivariate time series data using convolutional neural networks: A case study of healthcare cost prediction

J Biomed Inform 2020 11 111 103565

10.1016/j.jbi.2020.103565

32980530

S1532-0464(20)30193-3

Mikolov

Sutskever

Chen

Corrado

Dean

Distributed representations of words and phrases and their compositionality

NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 2013

26th International Conference on Neural Information Processing Systems

December 5-10, 2013

Lake Tahoe, Nevada

3111 3119

10.5555/2999792.2999959

Glicksberg

Miotto

Johnson

Shameer

Chen

Dudley

Automated disease cohort selection using word embeddings from Electronic Health Records

Pac Symp Biocomput 2018 23 145 156

29218877

9789813235533_0014

PMC5788312

Cui

Xie

Shen

Prediction task guided representation learning of medical codes in EHR

J Biomed Inform 2018 08 84 1 10

10.1016/j.jbi.2018.06.013

29928997

S1532-0464(18)30119-9

Huang

Wang

Zhang

Liu

Fei

Wei

Chen

Patient representation from structured electronic medical records based on embedding technique: Development and validation study

JMIR Med Inform 2021 07 23 9 7 e19905

10.2196/19905

34297000

v9i7e19905

PMC8367145

Chen

Shi

Xue

Wang

Dong

Cai

Chen

Zhu

Meng

Geographic variations in in‐hospital mortality and use of percutaneous coronary intervention following acute myocardial infarction in China: A nationwide cross‐sectional analysis

J Am Heart Assoc 2018 04 17 7 8 52 62

10.1161/JAHA.117.008131

GBD 2013 Mortality Causes of Death Collaborators

Global, regional, and national age–sex specific all-cause and cause-specific mortality for 240 causes of death, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013

The Lancet 2015 01 10 385 9963 117 171

10.1016/S0140-6736(14)61682-2

25530442

S0140-6736(14)61682-2

PMC4340604

Lee

Park

Choe

Ahn

Lee

Choi

Cha

Hong

Jeong

Korea Acute Myocardial Infarction Registry (KAMIR) Korea Working Group on Myocardial Infarction (KorMI) Investigators

Prediction of 1-year mortality from acute myocardial infarction using machine learning

Am J Cardiol 2020 10 15 133 23 31

10.1016/j.amjcard.2020.07.048

32811651

S0002-9149(20)30783-9

Aziz

Malek

Ibrahim

Raja Shariff

Wan Ahmad

Ali

Liu

Selvaraj

Kasim

Short- and long-term mortality prediction after an acute ST-elevation myocardial infarction (STEMI) in Asians: A machine learning approach

PLoS One 2021 16 8 e0254894

10.1371/journal.pone.0254894

34339432

PONE-D-21-01510

PMC8328310

Wang

Qian

Sun

Zhu

Liu

Chen

Sun

Nomograms based on pre-operative parametric for prediction of short-term mortality in acute myocardial infarction patients treated invasively

Aging (Albany NY) 2020 12 11 13 2 2184 2197

10.18632/aging.202230

33323557

202230

PMC7880403

Lundberg

Lee

A unified approach to interpreting model predictions

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems 2017

31st International Conference on Neural Information Processing Systems

December 4-9, 2017

Long Beach, California

4768 4777

10.5555/3295222

Medical Information Mart for Intensive Care 2022-07-23

https://mimic.mit.edu/

Wang

Zhou

Liu

Wei

Fei

Chen

Sequential data-based patient similarity framework for patient outcome prediction: Algorithm development

J Med Internet Res 2022 01 06 24 1 e30720

10.2196/30720

34989682

v24i1e30720

PMC8778569

Xie

Gao

Xie

Liu

Grant

Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors

Information Sciences 2016 08 354 19 40

10.1016/j.ins.2016.03.011

Wang

Bai

Liu

Zhang

Jiang

Wang

Zhou

EHR2Vec: Representation learning of medical concepts from temporal patterns of clinical notes based on self-attention mechanism

Front Genet 2020 11 630

10.3389/fgene.2020.00630

32714371

PMC7344186

Bai

Chanda

Egleston

Vucetic

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space

BMC Med Inform Decis Mak 2018 12 12 18 Suppl 4 123

10.1186/s12911-018-0672-0

30537974

10.1186/s12911-018-0672-0

PMC6290514

Miotto

Kidd

Dudley

Deep patient: An unsupervised representation to predict the future of patients from the electronic health records

Sci Rep 2016 05 17 6 26094

10.1038/srep26094

27185194

srep26094

PMC4869115

Zhang

Yin

Zeng

Yuan

Zhang

Combining structured and unstructured data for predictive models: a deep learning approach

BMC Med Inform Decis Mak 2020 10 29 20 1 280

10.1186/s12911-020-01297-6

33121479

10.1186/s12911-020-01297-6

PMC7596962

Liu

Pan

Mao

Liang

Wang

Chen

Explainable dynamic multimodal variational autoencoder for the prediction of patients with suspected central precocious puberty

IEEE J. Biomed. Health Inform 2022 3 26 3 1362 1373

10.1109/jbhi.2021.3103271

Choi

Schuetz

Stewart

Sun

Medical concept representation learning from electronic health records and its application on heart failure prediction

arXiv 2016

2022-07-19

https://arxiv.org/abs/1602.03686

Müller

Gromicho

de Carvalho

Madeira

Explainable models of disease progression in ALS: Learning from longitudinal clinical data with recurrent neural networks and deep model explanation

Computer Methods and Programs in Biomedicine Update 2021 1 100018

10.1016/j.cmpbup.2021.100018

Lundberg

Erion

Chen

DeGrave

Prutkin

Nair

Katz

Himmelfarb

Bansal

Lee

From local explanations to global understanding with explainable AI for trees

Nat Mach Intell 2020 01 2 1 56 67

10.1038/s42256-019-0138-9

32607472

PMC7326367

Lundberg

Nair

Vavilala

Horibe

Eisses

Adams

Liston

Low

DKW

Newman

Kim

Lee

Explainable machine-learning predictions for the prevention of hypoxaemia during surgery

Nat Biomed Eng 2018 10 2 10 749 760

10.1038/s41551-018-0304-0

31001455

10.1038/s41551-018-0304-0

PMC6467492

Pinto

Grandin

Risk prediction in AMI shock: Goldilocks and the search for "just right"

J Am Coll Cardiol 2017 04 18 69 15 1921 1923

10.1016/j.jacc.2017.02.024

28408021

S0735-1097(17)30674-5

Yeh

Sidney

Chandra

Sorel

Selby

Population trends in the incidence and outcomes of acute myocardial infarction

N Engl J Med 2010 06 10 362 23 2155 2165

10.1056/nejmoa0908610

Shroff

Frederick

Herzog

Renal failure and acute myocardial infarction: clinical characteristics in patients with advanced chronic kidney disease, on dialysis, and without chronic kidney disease. A collaborative project of the United States Renal Data System/National Institutes of Health and the National Registry of Myocardial Infarction

Am Heart J 2012 03 163 3 399 406

10.1016/j.ahj.2011.12.002

22424010

S0002-8703(11)00883-0

PMC3313453

D'Ascenzo

De Filippo

Gallone

Mittone

Deriu

Iannaccone

Ariza-Solé

Liebetrau

Manzano-Fernández

Quadri

Kinnaird

Campo

Simao Henriques

Hughes

Dominguez-Rodriguez

Aldinucci

Morbiducci

Patti

Raposeiras-Roubin

Abu-Assi

De Ferrari

Piroli

Saglietto

Conrotto

Omedé

Montefusco

Pennone

Bruno

Bocchino

Boccuzzi

Cerrato

Varbella

Sperti

Wilton

Velicki

Xanthopoulou

Cequier

Iniguez-Romo

Munoz Pousa

Cespon Fernandez

Caneiro Queija

Cobas-Paz

Lopez-Cuenca

Garay

Blanco

Rognoni

Biondi Zoccai

Biscaglia

Nunez-Gil

Fujii

Durante

Song

Kawaji

Alexopoulos

Huczek

Gonzalez Juanatey

Nie

Kawashiri

Colonnelli

Cantalupo

Esposito

Leonardi

Grosso Marra

Chieffo

Michelucci

Piga

Malavolta

Gili

Mennuni

Montalto

Oltrona Visconti

Arfat

Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets

The Lancet 2021 01 397 10270 199 207

10.1016/S0140-6736(20)32519-8

Rasmy

Xiang

Xie

Tao

Zhi

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

NPJ Digit Med 2021 05 20 4 1 86

10.1038/s41746-021-00455-y

34017034

10.1038/s41746-021-00455-y

PMC8137882