Trigger Tool–Based Automated Adverse Event Detection in Electronic Health Records: Systematic Review

Background Adverse events in health care entail substantial burdens to health care systems, institutions, and patients. Retrospective trigger tools are often manually applied to detect AEs, although automated approaches using electronic health records may offer real-time adverse event detection, allowing timely corrective interventions. Objective The aim of this systematic review was to describe current study methods and challenges regarding the use of automatic trigger tool-based adverse event detection methods in electronic health records. In addition, we aimed to appraise the applied studies’ designs and to synthesize estimates of adverse event prevalence and diagnostic test accuracy of automatic detection methods using manual trigger tool as a reference standard. Methods PubMed, EMBASE, CINAHL, and the Cochrane Library were queried. We included observational studies, applying trigger tools in acute care settings, and excluded studies using nonhospital and outpatient settings. Eligible articles were divided into diagnostic test accuracy studies and prevalence studies. We derived the study prevalence and estimates for the positive predictive value. We assessed bias risks and applicability concerns using Quality Assessment tool for Diagnostic Accuracy Studies-2 (QUADAS-2) for diagnostic test accuracy studies and an in-house developed tool for prevalence studies. Results A total of 11 studies met all criteria: 2 concerned diagnostic test accuracy and 9 prevalence. We judged several studies to be at high bias risks for their automated detection method, definition of outcomes, and type of statistical analyses. Across all the 11 studies, adverse event prevalence ranged from 0% to 17.9%, with a median of 0.8%. The positive predictive value of all triggers to detect adverse events ranged from 0% to 100% across studies, with a median of 40%. Some triggers had wide ranging positive predictive value values: (1) in 6 studies, hypoglycemia had a positive predictive value ranging from 15.8% to 60%; (2) in 5 studies, naloxone had a positive predictive value ranging from 20% to 91%; (3) in 4 studies, flumazenil had a positive predictive value ranging from 38.9% to 83.3%; and (4) in 4 studies, protamine had a positive predictive value ranging from 0% to 60%. We were unable to determine the adverse event prevalence, positive predictive value, preventability, and severity in 40.4%, 10.5%, 71.1%, and 68.4% of the studies, respectively. These studies did not report the overall number of records analyzed, triggers, or adverse events; or the studies did not conduct the analysis. Conclusions We observed broad interstudy variation in reported adverse event prevalence and positive predictive value. The lack of sufficiently described methods led to difficulties regarding interpretation. To improve quality, we see the need for a set of recommendations to endorse optimal use of research designs and adequate reporting of future adverse event detection studies.

1. To determine the prevalence of AEs as detected by an electronic/automated or semiautomatic trigger tool in various adult inpatient populations. 2. To describe the reliability of electronic/automated trigger tools 3. To explore methods of phenotyping if adverse event detection in EHRs was used in the international literature.
P Patient records of patients hospitalized at least 48 hours (inpatient) with any specific disease admitted to any ward I Global Trigger Tool or a modified version (added/removed/modified triggers) used in an automatic or semi-automatic way C Not applicable O Prevalence overall, by type of AE, and by type of hospital or ward.

Describe methods of patient selection:
A. Risk of bias: Could the selection of patient records have introduced bias?

Was the participation rate of eligible persons at least 50%? Yes / No / Unclear
If the rate is less, this likely introduces bias (From http://www.nhlbi.nih.gov/healthpro/guidelines/in-develop/cardiovascular-risk-reduction/tools/cohort).

Was a consecutive or random sample of patient records enrolled? Yes / No / Unclear
Reflect if all the subjects selected or recruited were from the same or similar populations (including the same time period)? Were inclusion and exclusion criteria for being in the study pre-specified and applied uniformly to all participants in a consecutive manner? If all accessible patient records were selected as a sample or if the process of sampling was done with the method of random sampling, this question will be answered as "yes".

Did the study consider patients covering a broad range of indications for hospitalisation? Yes / No / Unclear
This question will be answered with "no" when patients with very different profiles are not considered by exclusion from study entry. Such exclusions are highly likely to alter the estimates of prevalence. This is a situation where GTT might over/underestimate adverse events. For example: exclusion of certain groups of patients due to extended lengths of stay or high numbers of transfers.

RISK: LOW / HIGH / UNCLEAR B. Applicability: Are there concerns that the included patients and setting do not match the review question?
If a study did not meet the patient population as described in the objective there will be a high concern regarding its applicability. In this specific review, we allow for a broad range of settings and study populations.

Was the selection of the reviewer(s) based on his/her experience and/or professional background in the clinical setting? Yes / No / Unclear
A lack of experience on the part of the reviewer(s) in the clinical setting may introduce bias. For reviewers with appropriate clinical backgrounds the bias might be lower concerning misclassification rates.

Were the reviewer(s) trained on using and applying trigger tool methodology? Yes / No / Unclear
A lack of trigger tool training and application knowledge may introduce bias. For reviewer(s) with more training the bias might be lower.

Do the reviewer(s) have experience in applying the trigger tool or another retrospective chart review methodology? Yes / No / Unclear
A lack of trigger tool experience may introduce bias. For reviewer(s) with more trigger tool experience the bias might be lower.

Did the study use a test and validation sets to develop the algorithm? Yes / No / Unclear
The development of EHR algorithms always leads to choices (e.g., whether they should be geared towards sensitivity or precision). Using the split-half method is crucial to assess the efficacy of the proposed algorithm.

Is the inter-rater reliability clearly stated and sufficiently high? Yes / No / Unclear
Although a clinical diagnostic test accuracy study should be conducted after reliability is more or less established, in the field of GTT, the evidence on reliability is variable. For this reason, we added a signalling question regarding reliability. We classified "yes" if the interrater reliability was clearly assessed with sound methods, and was judged to be acceptably high.

Describe methods to sample patients to be included in the reliability estimations:
State the determined number of reviewers: State the number of patient records included in the reliability estimations: State the number of replicated observations: Describe the reviewer characteristics (e.g., training, experience): Did raters judge/analyse the records independently (e.g., stage 1 screening)?

B. Applicability: Are there concerns that the reviewer(s) do not match the review question?
For example, if the profile(s) of the reviewer(s) applying the trigger tool in the study differ substantially from those of healthcare professionals who would apply it in clinical practice, a high concern may arise.

Was the application of the algorithms fully automatic? Yes / No / Unclear
Algorithms implemented in a semi-automatic way with different process steps are expected to introduce more bias (semi-automatic processes being more prone to errors).

Did the development of the algorithm involve clinician(s) and was it based on a test set or an empirical approach? Yes / No / Unclear
Involvement of a clinician is assumed/shown to improve validity as well as an empirical development approach

Was the selection of triggers based on literature review and/or consultation with a group of experts in the field? Yes / No / Unclear
Empirical evidence or at least a biological rationale should exist for each of the triggers included in the tool. This particularly applies to new triggers not part of the original IHI set of triggers.

Are the triggers used the same for all settings and/or the hospitals participating in the study? Yes / No / Unclear
If differing versions of triggers are in use across participating units, a strong rationale should have been given to adapt them. For example, it is acceptable to adapt triggers according to national circumstances where a specific drug is not licensed.

Was the presence of any adverse event checked/controlled at the admission of the patient to the unit/hospital? Yes / No / Unclear
If not, the unit/hospital might conclude that those AEs were the consequences of their own care to the patients.

Are there concerns that the trigger tool test, its conducting, or its interpretation differ from the review question?
If test conducting, technology, setting or interpretation differ from your review question, the results may not be applicable. For example: triggers are not related to the IHI GTT; AEs captured in the study are more triggers than AEs.