This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Electronic medical records (EMRs) contain a considerable amount of information about patients. The rapid adoption of EMRs and the integration of nursing data into clinical repositories have made large quantities of clinical data available for both clinical practice and research.
In this study, we aimed to investigate whether readily available longitudinal EMR data including nursing records could be utilized to compute the risk of inpatient falls and to assess their accuracy compared with existing fall risk assessment tools.
We used 2 study cohorts from 2 tertiary hospitals, located near Seoul, South Korea, with different EMR systems. The modeling cohort included 14,307 admissions (122,179 hospital days), and the validation cohort comprised 21,172 admissions (175,592 hospital days) from each of 6 nursing units. A probabilistic Bayesian network model was used, and patient data were divided into windows with a length of 24 hours. In addition, data on existing fall risk assessment tools, nursing processes, Korean Patient Classification System groups, and medications and administration data were used as model parameters. Model evaluation metrics were averaged using 10-fold cross-validation.
The initial model showed an error rate of 11.7% and a spherical payoff of 0.91 with a c-statistic of 0.96, which represent far superior performance compared with that for the existing fall risk assessment tool (c-statistic=0.69). The cross-site validation revealed an error rate of 4.87% and a spherical payoff of 0.96 with a c-statistic of 0.99 compared with a c-statistic of 0.65 for the existing fall risk assessment tool. The calibration curves for the model displayed more reliable results than those for the fall risk assessment tools alone. In addition, nursing intervention data showed potential contributions to reducing the variance in the fall rate as did the risk factors of individual patients.
A risk prediction model that considers longitudinal EMR data including nursing interventions can improve the ability to identify individual patients likely to fall.
A considerable body of literature exists on fall prevention and reduction, yet despite many attempts by hospitals to reduce fall rates, significant and sustained reductions have proved elusive [
EMRs contain a considerable amount of information about patient histories and patient information conveyed both for discrete events and in narratives such as nursing notes. The increasing adoption of EMRs makes such clinical documentation a potentially rich and underutilized source of information for supporting nursing decisions [
The rapid adoption of EMRs and the integration of nursing data into clinical repositories have made large quantities of clinical data available for both clinical practice and research [
This study investigated the following research questions: (1) How can longitudinal data from nursing records be incorporated into fall risk modeling, which predicts daily risk at the patient-level? (2) How can electronic EMR data be incorporated into a fall risk modeling paradigm, focusing on 2 types of data elements of the EMR (structured data and semistructured data)? and (3) Does the fall risk model developed at a particular site or using a particular EMR system environment work at another site with a different EMR system and a different fall risk assessment tool?
This research team cast the problem of risk modeling as a probabilistic Bayesian network, which has several advantages for capturing and reasoning with uncertainty [
The 2 study cohorts were derived from the clinical data repositories of 2 institutions. One tertiary hospital was the “development site,” while the other tertiary hospital was the “validation site”; both are located near Seoul, South Korea. Both hospitals have approximately 1000 beds and have used EMR systems for >10 years. The development site had 24,000 coded nursing statements mapped to the International Classification for Nursing Practice (ICNP) terminology. These statements are used for documenting nursing notes with free-text entries. The validation site has coded nursing statements represented by 3N (North American Nursing Diagnosis Association, Nursing Intervention Classification, and Nursing Outcome Classification). The 2 study sites have different EMR systems with 2 different terminology standards and 2 different fall risk assessment tools.
The development cohort consisted of hospitalized inpatients who were admitted to 6 nursing units with high fall rates from September 1, 2014, to August 31, 2015. Patients were mainly registered in cardiovascular, hematology-oncology, and neurology medical departments. Inclusion criteria included adults aged ≥18 years and admitted for at least 24 hours. Exclusion criteria included admission to a psychiatric, obstetric, emergency, or pediatric medical department. Patients who died or had received resuscitation treatment were excluded. We identified 14,307 admissions (122,179 hospital days) that conformed with the inclusion criteria. We identified 220 events by analyzing the hospital’s event-reporting system, and an additional 18 cases were found through chart reviews conducted after prefiltering the free-text entries.
The validation cohort included 21,172 (172,592 hospital days) admissions from 6 medical-surgical nursing units. The units were selected on the basis of consistent nurse staffing and a case-mix with high fall rates in the hospital. The eligibility criteria applied to the development cohort were also applied to the validation cohort. As the fall rate on nursing units was estimated to be lower in the validation site, we extended data collection to a 2-year period from June 1, 2014, to May 31, 2016. A total of 292 falls were identified after analyzing the reporting system and chart reviews. We adopted the NDNQI operational definition of falls and level of injury [
Each cohort was divided randomly into model training and testing sets. For both training and testing, the patient stays were divided into windows with a length of 24 hours because nurses’ fall risk assessments can be conducted on a daily basis. For example, a patient hospitalized for 4 days can have a maximum of 4 fall risk assessments performed and documented in the EMR. A sliding-window approach was used to generate multiple windows covering patients’ data during their hospital stay by shifting the window to consecutive fall events. For fallers, only data that applied to within 24 hours before a fall were considered; data obtained prior to this were eliminated because it remains unclear whether they should receive a positive or negative label. For nonfallers, all of their data were included and labeled as negative. Samples were split into the training and testing sets while including samples from a given patient only in one of these sets. This approach was used to mirror the end-use situation more closely, where the system is evaluated on patients who are different from those on whom the model was trained. The imbalance between positive and negative labels was removed by oversampling the positives based on the ratio of positive-to-negative examples. According to a study [
This retrospective study was reviewed and approved by the institutional review boards at the 2 hospitals, and the need for patients’ informed consent was waived because the study involved the collection of deidentified data.
Variables were selected on the basis of a literature review focusing on clinical guidelines published within the past 5 years (2012-2017). We adopted the following 8 fall prevention guidelines recommended by the Joint Commission [
Concepts derived from the literature review and local data elements mapped to concept variables in the prediction model.
Category and care component | Model concept | EMRa data element in development site | EMR data element in validation site | |
Demographics | Age | Age | Age | |
Diagnosis or procedure | Primary and secondary dxb, surgical operation | Medical dx. (ICDc code), dates of surgical operation | Medical dx. (ICD code), dates of surgical operation | |
Administrative | Discharge unit, medical department, hospital days | Discharge unit, medical department, length of stay | Discharge unit, medical department, length of stay | |
Physiological or disease-related factors | Visual and hearing impairment, elimination impairment, gait, mobility impairment, use of walking aids or devices, presence of dizziness, general weakness, orthostatic hypertension, and pain | Nursing assessment and dx.; physiologic evaluation and problem (eg, impaired mobility, incontinence, etc), KPCSd | Nursing assessment and dx.; physiologic evaluation and problem (eg, impaired mobility, incontinence, etc), KPCS | |
Cognitive factors | Dementia, delirium, disorientation, level of consciousness, fear, irritability, noncompliance | Nursing assessment or dx.; cognitive function (eg, acute confusion, disorientation, noncompliance, etc) | Nursing assessment or dx.; cognitive function (eg, acute confusion, disorientation, noncompliance, etc) | |
Behavioral factors | Fall history, sleep impairment | Presence of past falls, nursing dx. related to sleep | Presence of past falls, nursing dx. related to sleep | |
Therapeutics | Medications, adverse reaction to medications, catheter (IVe-line, tube, Foley), use of restraints | Medication list by class (sedatives, antidepressant, antiemetics, antipsychotics, antianxiety drugs, diuretics, antiepileptics, antihypertensives, analgesics, antiarrhythmics and NSAIDsf), Physician order of fluid injection, tube, Foley and restraints. | Medication list by class (sedatives, antidepressant, antiemetics, antipsychotics, antianxiety drugs, diuretics, antiepileptics, antihypertensives, analgesics, antiarrhythmics and NSAIDs), Physician order of fluid injection, tube, Foley and restraints. | |
Universal fall precautions | Fall precautions on admission, regular rounds | Nursing interventions; safety education on admission, rounds per 2 hours | Nursing interventions; safety education on admission, rounds per 2 hours | |
Education and communication | Patient and caregiver education, presence of bedsitter, use of visual indicators, communicating fall risk status to care team | Nursing interventions; fall prevention education, presence of bedsitter, use of visual indicators, and activities communicating fall risk status to care team | Nursing interventions; fall prevention education, presence of bedsitter, use of visual indicators, and activities communicating fall risk status to care team | |
Observation and surveillance | Fall risk assessment tool | Hendrich II score and subscores [ |
STRATIFYg score and subscores [ |
|
Risk-target intervention | Cognitive and mental function | Nursing interventions: repeatedly provision of orientation, hourly rounding, assigning room close to nursing station, keep caregivers or family members on bed-side, etc. | Nursing interventions: repeatedly provision of orientation, hourly rounding, assigning room close to nursing station, keep caregivers or family members on bed-side, etc. | |
Toileting problem | Nursing interventions: provision toilet scheduling, assist toileting, provision comodo or bed-pan, etc. | Nursing interventions: provision toilet scheduling, assist toileting, provision comodo or bed-pan, etc. | ||
Impaired mobility | Nursing interventions: provision of mobility devices, walking aids, and assistance, etc. | Nursing interventions: provision of mobility devices, walking aids, and assistance, etc. | ||
Medication review | Nursing interventions: rearranging medication time, provision side-effect precaution, etc. | Nursing interventions: rearranging medication time, provision side-effect precaution, etc. | ||
Sleep disturbance | Nursing interventions: attention to night movement and noise, inducing sleep pattern changes, etc. | Nursing interventions: attention to night movement and noise, inducing sleep pattern changes, etc. | ||
Environmental intervention | Keeping paths clear, inspect furniture, equipment, lighting, floor, room arrangement | Nursing interventions; environmental targeted | Nursing interventions; environmental targeted |
aEMR: electronic medical record.
bdx: diagnoses.
cICD: International Classification of Diseases.
dKPCS: Korean Patient Classification System.
eIV: intravenous.
fNSAIDs: nonsteroidal anti-inflammatory agents.
gSTRATIFY: St. Thomas’ Risk Assessment Tool in Falling Elderly Inpatients.
The 4 steps of building a predictive Bayesian network model. LONC: Logical Observation Identifiers Names and Codes; ICNP: International Classification for Nursing Practice; EMR: electronic medical record.
Our research team used the following principles to enable the prediction model translation into practice: (1) based on the existing nursing knowledge or clinical guidelines; (2) interpretable to users; and (3) parameterized to be adjusted and refined based on the target population’s characteristics changing over time and sites. At the development site, we first constructed a concept model and, then, mapped the concept variables to local data elements, which followed by training with local cohort data. The same concept model was then applied to the validation site, and the model parameters were trained and tested by the local cohort.
The Bayesian network model was specified as follows. A Bayesian network or probability network B=(Pr, G) is a model of a multivariate probability distribution over a set of selected concept variables and consists of a graphical structure G and an associated distribution Pr [
To build the Bayesian network model structure, we identified relationships between the concepts derived from the 8 fall prevention guidelines. The relationships, expressed with arcs in the network graph, were determined based on physiological, chronological, and logical processes. For example, the items of visual impairment, frequent toileting, transfer, and mobility from the STRATIFY 5 subscales closely relate to the data from nursing assessments. Furthermore, the Hendrich II 7 subscales have close relationships with medications, gender, medical diagnosis, as well as nursing assessments. These relationships were expressed in the network structure. The local conditional probability distributions Pr(
where
The model prediction performance was assessed using sensitivity, specificity, receiver operating characteristics (ROC) curves, 10-fold cross-validation, and performance indices such as the spherical payoff [
We performed a sensitivity analysis to establish the quality and clinical utility of the fully specified Bayesian network. We observed the output of the network to detect possible inaccuracies in the underlying probability distribution. We determined the degree to which variations in the posterior probability distributions were explained by other variables. The model sensitivity was calculated as the variance reduction with continuous variables and the entropy reduction with ordinal-scale or categorical variables. We used Netica modeling software (version 3.2, Norsys Software Corporation, Vancouver, Canada) to complete the analysis.
Descriptive statistics on population profiles are presented as mean and SD or frequency and percentage values. Each cohort was compared using chi-square test or
The 2 cohort populations had some differences in their characteristics (
Characteristics of the two cohorts.
Characteristic | Development site (n=14,307) | Validation site (n=21,172) | |||||
Females, n (%) | 6157 (43.03) | 11,199 (52.90) | 332.20a | <.001 | |||
629.0 ( |
<.001 | ||||||
<50 | 3165 (22.12) | 5593 (26.42) | N/Ac | N/A | |||
50-60 | 3251 (22.72) | 3844 (18.16) | N/A | N/A | |||
60-70 | 3356 (23.46) | 3517 (16.61) | N/A | N/A | |||
70-80 | 3281 (22.93) | 5039 (23.80) | N/A | N/A | |||
>80 | 1254 (8.76) | 3179 (15.02) | N/A | N/A | |||
Length of stay in days, mean (SD) | 8.54 (11.52) | 8.15 (11.28) | 3.14a | .002 | |||
11,701.0 ( |
<.001 | ||||||
Neoplasm | 4639 (32.4) | 4869 (23.00) | N/A | N/A | |||
Benign | 385 (2.7) | 1066 (5.03) | N/A | N/A | |||
Circulatory disorder | 5670 (39.6) | 769 (3.63) | N/A | N/A | |||
Respiratory and gastrointestinal disorders | 655 (4.6) | 5630 (26.60) | N/A | N/A | |||
Surgical procedure | 517 (3.6) | 2163 (10.22) | N/A | N/A | |||
Neurological disorder | 998 (7.0) | 263 (1.24) | N/A | N/A | |||
Infectious disorder | 115 (0.8) | 813 (3.84) | N/A | N/A | |||
Other | 1328 (9.3) | 5599 (26.45) | N/A | N/A | |||
Presence of secondary diagnosis, n (%) | 14,242 (99.6) | 13,421 (63.40) | 6497.45a | <.001 | |||
52.8 ( |
<.001 | ||||||
Group 1 | 227 (1.59) | 377 (1.78) | N/A | N/A | |||
Group 2 | 8197 (57.29) | 11,349 (53.60) | N/A | N/A | |||
Group 3 | 3898 (27.25) | 5630 (26.59) | N/A | N/A | |||
Group 4 | 1627 (11.37) | 1332 (6.29) | N/A | N/A | |||
Groups 5 and 6 | 262 (1.83) | 0 (0) | N/A | N/A | |||
Number of medications daily, mean (SD) | 2.5 (6.8) | 18.6 (9.9) | −1835.04a | <.001 | |||
Total number of medications, mean (SD) | 24.4 (75.7) | 172.3 (317.7) | −63.07a | <.001 | |||
4.7 ( |
.09 | ||||||
One | 231 (1.61) | 284 (1.34) | N/A | N/A | |||
Multiple | 7 (0.05) | 8 (0.04) | N/A | N/A |
a
b
cN/A: not applicable.
dGroup 1 has the lowest nursing needs, while group 6 has the highest nursing needs.
The fall prediction model identified at the development site consisted of 56 nodes and 82 links. The error rate of the prediction model was 11.7%, and the spherical payoff was 0.91. The calibration curves showing the relationship between observed and predicted outcome event rates divided into deciles revealed that the prediction reliability differed between the prediction model and the Hendrich II tool (
In the model development site, the sensitivity test showed that Hendrich II data reduced the variance the most (
The validation model consisted of 48 nodes and 80 links. The error rate was 4.87%. The logarithmic loss and spherical payoff were 0.13 and 0.96, respectively. These scores indicate the classification abilities of the model [
The area under the ROC curve was 0.99 and slightly higher than that for the development site model, which implies that the model performance was >30% higher than that of the STRATIFY tool (
Calibration curves for the prediction and Hendrich II models at the development site. The data are mean and 95% CIs.
The receiver operating characteristics curves showing the discrimination ability in the fall prediction. AUC: area under the curve. STRATIFY: St. Thomas’ Risk Assessment Tool in Falling Elderly Inpatients.
Results of the sensitivity analysis for subgroup summations of the prediction models. Dark-gray and light-gray bars correspond to the development and validation sites, respectively.
Calibration curves for the prediction model and St. Thomas’ Risk Assessment Tool in Falling Elderly Inpatients (STRATIFY) tool at the validation site. The data are mean and 95% CIs.
We found that longitudinal EMR data could be incorporated successfully into a prediction model, which performed better at discriminating at-risk and no-risk patients than did the existing fall risk assessment tools alone. The EMR data included in the model were medication, patient classification (KPCS), the fall risk assessment tool, and the nursing-process (assessment, diagnoses, and intervention), demographics, and administrative data. The model exhibited acceptable performance at the 2 sites with different EMR systems, patient populations, fall risk assessment tools, and nursing terminology standards. In particular, semistructured EMR data (mostly nursing-process data) were semantically incorporated into a prediction model. These results imply that evidence-based prediction models that incorporated all relevant and time-variant data elements from an EMR system can be used as a more reliable guide for the fall risk assessment tools alone.
The 2 sites involved in this study have different patient profiles in terms of age, primary diagnosis, and medication distributions. However, the rates of falling and injurious falls at the 2 sites were similar. This finding is consistent with a study from the National Institutes of Health in 2013 [
We used all the data available in the EMR systems that are known to be relevant to inpatient falls based on clinical guidelines. One of the challenges in EMR-based studies is the presence of missing data [
A key challenge when building predictive models from EMR data is handling nursing interventions. These interventions are confounders in that they can reduce the likelihood of a fall and, thereby, make it difficult to distinguish between patients who are at risk for falls based on their fall risk assessment score and those who are at risk, but their fall risk is mitigated by preventive interventions. Paxton et al [
The model developed in this study could be used to evaluate the performance and uncertainty of the Bayesian network. The c-statistic values of 0.96 and 0.99 found in this study were much higher than those found in studies of prediction models for mortality and clinical outcomes based on the EMR data (c-statistic=0.84 and 0.83, respectively) [
Another comparable study is that of Marier et al, who investigated fall prediction using the MDS and EMR data of 13 nursing home residents [
The approach adopted in this study has several advantages over previously proposed methods for estimating the risk of falling. The first advantage relates to external validation, which is uncommon given that almost all studies have validated performance within the same EMR environment [
Second, this study incorporated >50 concepts mapped to 70 time-varying data elements, which represents a relatively large number of variable sets. We found only a small number of studies that used longitudinal EMR data, and they did not fully utilize the depth of information on patients available in the nursing records to identify predictor variables [
A third advantage of our approach relates to the incorporation of nursing-process data, including the fall prevention interventions provided to patients. It is difficult to find an EMR-based study that has integrated the nursing activities of assessments, diagnoses, and interventions—this was possible in this study because the 2 EMR systems included complete electronic nursing notes consisting of coded and standardized statements using locally developed data dictionaries [
Finally, using our model, we calculated for each patient, the daily estimate of their risk of falling. As the estimated probability ranges from 0% to 100%, users could set a cutoff of risk depending on an appropriate level of sensitivity and specificity.
The next steps involve implementing this approach more broadly and performing a prospective evaluation of the net benefits obtained by providing fall prevention nursing decision support in practice, as well as validating the model at other sites. For example, interventions tailored to patients’ individual fall risk factors could be recommended in real time to them. We plan to incorporate a tailored intervention guide according to the individual risk factors of at-risk patients. This will be a great opportunity to explore how the algorithms impact the clinical decision making of nurses.
We found that a risk prediction model that utilizes longitudinal EMR data on nursing assessments, diagnoses, and interventions can improve the ability to identify individual patients who are at a high risk of falling. The prediction model has demonstrated portability and reliability and can, therefore, be applied across hospitals with different EMR environments. Current EMR systems—even suboptimal ones—can be leveraged for the secondary use of clinical data to prevent patients from falling.
electronic medical record
International Classification for Nursing Practice
Korean Patient Classification System
minimum dataset
receiver operating characteristics
synthetic minority oversampling technique
St. Thomas’ Risk Assessment Tool in Falling Elderly Inpatients
We would like to thank Keoung-Hee Choi, Jungmee Han, and Won-Hee Park of medical information department of the 2 hospitals for helping us retrieve the raw data efficiently. In addition, we appreciate the clinical staff of Nursing Departments and graduate students involved in the concept mapping process of local data elements with standard vocabularies. This study was supported by a grant from the Korea Healthcare Technology R&D Project, Ministry for Health and Welfare, Republic of Korea (No. HI15C1089 and HI17C0809).
None declared.
IC conceived and designed the study, supervised and contributed to the data analysis, interpreted the results, and drafted and revised the paper. EHB and EJC contributed to the study design, data acquisition, results interpretation, and paper revision. DB and PD substantially contributed to data interpretation and made critical revisions regarding the intellectual content.