Development, Implementation, and Evaluation of a Personalized Machine Learning Algorithm for Clinical Decision Support: Case Study With Shingles Vaccination.

Background Although clinical decision support (CDS) alerts are effective reminders of best practices, their effectiveness is blunted by clinicians who fail to respond to an overabundance of inappropriate alerts. An electronic health record (EHR)–integrated machine learning (ML) algorithm is a potentially powerful tool to increase the signal-to-noise ratio of CDS alerts and positively impact the clinician’s interaction with these alerts in general. Objective This study aimed to describe the development and implementation of an ML-based signal-to-noise optimization system (SmartCDS) to increase the signal of alerts by decreasing the volume of low-value herpes zoster (shingles) vaccination alerts. Methods We built and deployed SmartCDS, which builds personalized user activity profiles to suppress shingles vaccination alerts unlikely to yield a clinician’s interaction. We extracted all records of shingles alerts from January 2017 to March 2019 from our EHR system, including 327,737 encounters, 780 providers, and 144,438 patients. Results During the 6 weeks of pilot deployment, the SmartCDS system suppressed an average of 43.67% (15,425/35,315) potential shingles alerts (appointments) and maintained stable counts of weekly shingles vaccination orders (326.3 with system active vs 331.3 in the control group; P=.38) and weekly user-alert interactions (1118.3 with system active vs 1166.3 in the control group; P=.20). Conclusions All key statistics remained stable while the system was turned on. Although the results are promising, the characteristics of the system can be subject to future data shifts, which require automated logging and monitoring. We demonstrated that an automated, ML-based method and data architecture to suppress alerts are feasible without detriment to overall order rates. This work is the first alert suppression ML-based model deployed in practice and serves as foundational work in encounter-level customization of alert display to maximize effectiveness.


Background and Significance
The potential effectiveness of clinical decision support (CDS) alerts as a scalable tool for promoting evidence-based care for vaccine administration has led to their frequent use by most health care systems seeking to maximize vaccination rates [1][2][3][4]. CDS alerts to prompt evidence-based practices have been extensively studied and shown to work best when delivered at an appropriate time and place in the clinical workflow, that is, when the clinician is prepared to receive the information [5][6][7][8]. Successful CDS alerts have led to a reduction in prescribing brand-name antibiotics [9], improved lipid management in renal transplant patients [10], improved compliance with guidelines for treating HIV [11][12][13], reduced ordering of tests when costs were displayed [14], and age-specific alerts that reduce inappropriate prescribing in the elderly [15][16][17][18][19][20].
Although CDS tools are effective reminders of best practices, their effectiveness is blunted by the context in which they are deployed; alert fatigue (clinician desensitization driven by overwhelming number and quality of safety alerts) [21] is the result of an ever-growing number of alerts in the electronic health record (EHR), leading to clinicians commonly ignoring or failing to respond appropriately to alerts. Alert fatigue resulting from an excess of poor-quality alerts (eg, alerts firing at inappropriate times or for inappropriate patients) contributes to clinicians' perceptions that the bulk of alerts are likely clinically insignificant regardless of their clinical message. As a result, clinicians now override most medication alerts [22][23][24] and are becoming increasingly desensitized to alarms [25][26][27]. Although there is limited consensus on how to measure alert fatigue and its unintended consequences, data show that alert fatigue is significantly impacting the clinician experience and patient care [28][29][30][31]. At our large academic health system, the number of active interruptive alerts for providers grew from 13 in 2012 to 107 in 2018, an increase of more than 800%. In December 2018, our providers ordered the shingles vaccine in response to just 6.43% (2219/34,531) of the alerts, indicating that our clinicians view a majority of these alerts as inappropriate. Consequently, an improved EHR experience for clinicians has become an institutional priority for many health systems, including our own.
Individual-level factors, including clinicians' bias toward ignoring alerts and poor signal detection resulting from the overwhelming number of alerts, and poor alert reliability add to the degraded effectiveness of CDS and user experience [32,33]. To optimize a CDS system means to optimize the signal-to-noise ratio of alerts by increasing the signal, decreasing the noise created by an abundance of inappropriate, poorly timed alerts or both. To this end, prior work in medication alerts and monitoring alarms have implemented advanced interventions that use rules to surface or suppress alerts, intending to improve CDS alert signal. A study using basic rules to deactivate irrelevant alerts and manually alter other alert frequencies based on severity decreased the override rates from 33.6 to 4.6 per 100 orders [34]. Similar severity ranking showed success in increasing alert acceptance rate by 50%, despite a 60% increase in alert events [35]. Other studies attempting to reduce noise, however, achieved limited or mixed results [36][37][38][39][40][41][42][43][44][45].
Research indicates that delivering alerts at the appropriate time and place in the clinical workflow is key to effective CDS [5][6][7][8]. Prior work to optimize CDS tools focuses on manual approaches [4,46]; these have proven to be time consuming, difficult to maintain, and static, limiting scalability. Optimization and incorporation of more sophisticated rules to surface or suppress alerts achieves limited reduction [36][37][38][39][40][41]. Machine learning (ML) is a powerful tool for identifying patterns in complex data by using past data to predict future performance. The use of ML in health care has proliferated over the past 10 years in a variety of use cases. ML applied to EHR data specifically shows signs of promise as a tool for improving safety and quality of care; its application to problems such as predicting readmission and sepsis shows the ability of ML ability to better target alerts to the appropriate user and use case [47][48][49]. An EHR-integrated ML algorithm is a potentially powerful tool to improve the quality of care by increasing the signal-to-noise ratio of alerts to positively impact clinicians' interactions with these alerts. To date, the informatics literature lacks both prospective evaluation of signal-to-noise optimization interventions as well as detailed accounts of operational steps necessary to implement ML models in clinical care. Using the shingles vaccination alert as our initial use case, we leveraged historical EHR interaction data (clicks), patient and provider sociodemographic data to (1) build and train an ML model that can predict the likelihood of provider interaction with the shingles vaccination alert and (2) establish the data architecture necessary to deploy the model in a live environment.

Objective
The objective of this case study was to describe the development, implementation, and prospective evaluation of a novel, ML-based, CDS signal-to-noise optimization (SmartCDS) system that suppresses low-value vaccination alerts applied to a shingles vaccination CDS alert.

Setting
This work was conducted within a large urban academic hospital system with approximately 1300 beds over several satellite locations. In the fiscal year 2016, 3584 doctors and 4899 nurses treated approximately 38,000 inpatient admissions, 5.8 million outpatient visits, and 150,000 emergency department visits.

Data
This study uses all data from January 1, 2017, to March 11, 2019, to maintain consistency with the shingles alert content and its clinical setting. The dataset includes a total of 695,311 shingles alerts presented to 780 providers over 327,737 encounters, covering 144,438 unique patients. The overall alert interaction rate (any action toward acknowledging the shingles alert in an encounter) during this period was 16%, and the overall order rate of the shingles vaccine in response to the alert was 5%. The alert response options are illustrated in Figure  1-providers may choose from four different actions: open SmartSet to sign vaccine orders for targeted patients, health maintenance override, postpone or customize health maintenance modifier based on refusal, and deferral or other decisions made by patients; the alert appears on a side tab located at the right side of the EHR interface and may also be ignored or closed when no action is taken.

Feature Construction
A retrospective data query was performed to extract data related to the shingles vaccine alert. Key data elements to be extracted from the EHR were determined using a combination of descriptive analysis and clinical expertise. After data cleaning, historical changes were analyzed in alert usage to determine the optimal period from which data can be extracted for model training (two years of data from January 1, 2017, to December 31, 2018). Using data from our alert system, the average response rates for the alerts as well as the providers' interaction history with the alerts were examined for the purpose of determining an appropriate protocol for assigning one unique provider to each alert encounter. Initial analyses demonstrated a large variation among clinicians with regard to the frequency of interaction with the alerts (0%-92%), prompting our team to construct variables for an individual clinician's activity history, which was expanded to several short-term and long-term activity history variables capturing response rates, alert volume, and demographic variables for both clinicians and patients. The features that affect clinician's response include (1) clinician-level demographics, clinical roles, and specialties; (2) response rate to previous shingles alert (both short-term and long-term); and (3) the number of recent encounters. The patient-level data included were patient demographics and history of targeted shingles alert responses and shingles vaccine orders by clinicians. In addition, a binary flag indicating walk-in visits and scheduled office visits was included as the architecture did not capture walk-in visits in our pilot implementation.

Machine Learning Model
The model was designed as a binary classification task. The target labels were built based on whether an alert instance was interacted with or whether a follow-up order for shingles vaccination was placed in each primary care visit. The data were split randomly based on individual clinicians into 80%, 10%, and 10% sets for model training, validation, and testing, respectively, as illustrated in Figure 2. XGBoost was employed as our ML algorithm, with learning rate=0.3, maximum tree depth=0.6, minimum child weight=1, no subsampling, negative log loss, and early stopping (with a maximum of 50 rounds). The validation set was used to monitor the model training through early stopping to derive the operational score threshold and evaluate the model performance; the test set was used to evaluate the effectiveness of the score threshold and the generalizability of the trained model retrospectively. To evaluate the performance of the model, we obtained a sample of nearly 65,000 primary care visits. We reported a highly effective model, adopting individual profiling of providers to reduce the number of clinically insignificant alerts, with average area under receiver operating characteristic of 0.919 and average area under precision-recall curve of 0.562 using 5-fold cross-validation. Our simulation found that of the 50.00% (6490/12,980) lowest ranked vaccination alerts, 99.77% (6475/6490) have been ignored by providers if not suppressed. Given that the corresponding estimated order reduction via nested cross-validation was deemed conservative at 1%, a 50% suppression threshold was selected in collaboration with clinical stakeholders [50]. As a result, the model that relies on personal history for features is updated daily to incorporate the latest data and update the 50% score threshold during prospective implementation. Upcoming appointments are used for ongoing training, making the training window ongoing. The patients' appointments for initial primary care visits are excluded from suppression. Figure 2. Experimental design. All data from January 1, 2017, to March 11, 2019, are used to train the model. Data are divided based on clinicians into 80%, 10%, 10% splits as train, validation, and test set, respectively. Each day, the data from the previous day are added to the dataset and the model is retrained and evaluated on the updated validation set to derive the 50% suppression score threshold. Predictions are made on upcoming visits (appointments) following the shingles best practice advisory (BPA) eligibility in the next day, and BPA instances are suppressed if the predicted score is lower than the threshold. Upcoming visits, which are logged into the shingles BPA log in the electronic health record system, are used for training in the future. In this design, the training window is always growing, with January 2017 as the start date. ST: score threshold.

Pilot Design and Evaluation
A pilot study was designed over 6 weeks ( Figure 3) in biweekly cycles (alternating turning the model on for one week and turning the model off for another week) to verify that the data distribution in the training/validation set was applicable to that in production. In the pilot, key statistical measures were examined with the model both turned on and turned off to compare prospective model performance with estimates generated in retrospective evaluation. The provider response and follow-up orders associated with suppressed shingles alerts cannot be measured; therefore, prospective model performance was evaluated using the percentage of daily suppressed alerts, daily alert response rate, weekly shingles vaccination order count, and alerts per order rate. Weekly aggregated measures were employed because of weekly patterns detected in the clinical setting (eg, Wednesdays and weekend days featured lower alert volume).  2) The data extraction module will query the reporting database (Epic Clarity) and feed (3) encounter, provider, user demographic, and best practice advisory data to the ML model. (4) The model output is then both stored in a local database for further analysis and pushed (5) to Epic through an Epic Interconnect Web Services endpoint. From here, information about what alert per encounter should be suppressed is written (6) into the Epic event database (Chronicles) through a SmartData element. An alert rule will inspect (7) these data to allow or suppress the alert being fired. BPA: best practice advisory; ML: machine learning.

Architecture for Signal-to-Noise Optimization System Deployment
After the construction of ML model for alert suppression (see Methods), we built a new data architecture to operationalize the model (Figure 3). The overall signal-to-noise optimization (SmartCDS) system was broken into three components: (1) the data extraction module, which identifies planned visits for the next day, with the intent to identify upcoming vaccine alerts to suppress, and queries the EHR to extract the variables required to run the ML module; (2) the suppression ML model itself (the ML module built as described above); and (3) the suppression module, which leverages a series of application programming interface calls to the EHR to communicate the alerts that should to be suppressed. The data extraction module queries the EHR and extracts features that should then be passed to the ML module.
The steps, related tasks, and timeline for the development and operationalization of the system are detailed in Table 1.

Aggregation of Production Data and Configuration of a Web Service Endpoint
With the alert suppression model built and data aggregated to support production, we worked with our institutional EHR team to create predefined and operationally approved queries to build easy-to-access views of our variables of interest in the EHR database. This enabled and automated the data extraction needed to operationalize the SmartCDS system. We then established a local database to serve as a repository for monitoring and tracking data runs, reports, and system errors per best practices. To complete our work on creating the technical capacity necessary to implement the SmartCDS system, we worked with the enterprise information technology team to determine the appropriate Web service to call; we then created the rules necessary to appropriately respond to the data sent to that endpoint and, if appropriate, suppress the target alert (in this case, the shingles vaccination alert).

Optimization of Machine Learning Script and Docker Image and Formation of Container
Once built, we validated the SmartCDS system with the shingles alert. Predictions are made on upcoming appointments by applying the model to our predefined views, modifying the ML script, and generating model predictions (suppress yes or no), which are saved in a local database and applied to suppress an alert with a predicted score less than the threshold (additional details in Methods). The system was designed to be modular and orthogonal with regard to call frequency, instrumentation, and configuration, allowing for easy adaptation to new environments. Under these parameters, we formed Docker containers (standard units of software that package code and all its dependencies, so each application runs quickly and reliably from one computing environment to another) to appropriately configure the Web service, define the frequency and timing of functionality, and log the system's normal and error events.

Reporting Dashboard Development
A dashboard was developed to ensure that the system was running properly with a stable performance of the model from a safety and operational perspective and to monitor process outcomes of interest. The dashboard features process outcomes of interest (eg, suppression percentage, daily order counts, and daily alert volume) and factors in timing and frequency of report delivery based on feedback from clinical and operational stakeholders. Daily logging and weekly monitoring reports (Figure 4 and Multimedia Appendix 1) were constructed to enable detection of abnormal model behavior related to data shifts or model failures. If, on any date, the alert-related volume diverges from previous patterns, the alert log stream along with the related EHR data can be examined to locate the source of the anomaly.

Daily Alert-Related Volumes
We leveraged the 6 weeks of data (January 19-March 11, 2019) to compare the volume difference in shingles alert count, interacted alert count, and order count between weeks with the model turned on and turned off. We observed 42.2% (3541.0 with the model turned on vs 6123.7 with the model turned off) reduction in the alert count, no significant reduction in the interacted alert count (one-sided two sample t test; P=. 20) or in the order count (one-sided two sample t test; P=.38) during the 6-week biweekly cycle with the model turned on and model turned off (Table 2).

Alerts per Order Rate and Signal-to-Noise Ratio
Our 6-week pilot deployment of the system in the live environment indicates an alert suppression rate of 43.7% out of 35,315 appointments ( Figure 4), with stable shingles vaccine order volume (no statistically significant difference between active and inactive suppression) slightly lower than the predefined 50% threshold. Initial inspection showed that, on an average, walk-in visits had a higher alert ignored rate (91%) compared with scheduled office visits (87%) in 2017 and 2018. As the model only operated on scheduled appointments in this study and the activity history has the highest weight toward a suppression decision, a slightly lower suppression rate than 50% was expected.
The ratio of alerts fired to orders placed with the model turned on was almost half of that of the ratio with the model turned off, whereas the ratio of the interacted alerts per order placed remained the same. By mapping the average orders placed as the average power of signal and the average count of ignored alert with no follow-up orders as the average power of noise, the signal-to-noise ratio changed from 5.7% to 10.1%, a 78% increase. Furthermore, by mapping the interacted alerts (including follow-up orders) as signal and the ignored alerts with no follow-up actions as noise, the signal-to-noise ratio changed from 23.4% to 46.1%, a 97% increase.

Principal Findings
This paper describes the steps and considerations involved in the development and implementation of an ML model for suppressing low-value alerts in the EHR for the shingles vaccination. As predicted in our simulation, validation of this signal-to-noise optimization (SmartCDS) system demonstrated substantial reduction in the shingles vaccine alerts at a limited vaccine ordering expense. The rate of daily alert interaction among individual clinicians during the 6-week pilot was higher with the model turned on vs the model turned off. This result was expected because of the 42.2% lower volume of shingles alerts observed with stable daily alert interactions. Interestingly, the overall interaction rate gradually decreased over the 6-week cycle (Figure 4). This finding is consistent with the findings that responsiveness to alerts tends to decrease over time [29]. During the 6-week pilot, the profile of the providers who accepted the alerts did not change, indicating that the profile of patients who are offered the vaccination did not change either.
This will be confirmed in our follow-up studies. To date, our literature review indicates that our SmartCDS system is the first to develop an ML-based system to suppress clinically insignificant alerts or alerts unlikely to be accepted and to prospectively evaluate the system in a large-scale health care system. Relevant literature to date has been limited to retrospective studies focused on identifying false-positive or clinically insignificant physiologic monitor alarms (false alarms). In 2015, Physionet opened a challenge to reduce false arrhythmia alarms using a subset of the Medical Information Mart for Intensive Care II waveform database [51]. The best models showed that by allowing 30 seconds of delay, false alarms can be better distinguished from true alarms; the best models were able to achieve 80% reduction in false alarms, missing 1% of true alarms. Studies focusing on pulse oximetry to reduce peripheral capillary oxygen saturation (SpO 2 ) false alarms, intracranial pressure alarms, and general vital sign monitoring alarms found mixed results ranging from 25% to 47% in alarm reduction, with 0% to 5% false-negative rates [42][43][44][45]. A more recent study showed that, by increasing delayed time within 3 min for alarms with physiologic monitoring waveforms, as well as including electrocardiography, SpO 2 , and arterial blood pressure, an ML model can achieve slightly better performance but fails to stably generalize to unseen data [52].
The development of a robust reporting structure allows for the logging and monitoring of the system and its impact on clinical outcomes, which are necessary to ensure the stability and safety of the system. Future work will involve gathering feedback from front-line stakeholders to support the adaptation of the signal-to-noise optimization system to other alerts, enabling the system to ingest real-time data as well as further development of a reporting dashboard with effective, user-centered data displays, and a systematic process for establishing organizationally acceptable thresholds for alert suppression.

Limitations
During the pilot implementation and evaluation, the model only operated on scheduled office visits because of infrastructure gaps restricting the ability to incorporate walk-in visits. We are working to address this gap to be able to assess the effectiveness and impact of this model on a global level. On the other hand, it is possible that clinicians will start to adjust to the volume change in the shingles alert delivery, leading to less responsiveness and less ordering. As potential external or systematic biases, such as seasonal effects, could lead to inaccurate observations and conclusions, we will implement a more comprehensive statistical evaluation after updating the infrastructure to systematically address these potential biases.

Conclusions
Our model presented high discriminatory power in the initial prospective evaluation of shingles alert interactions. Our approach was effective in suppressing unnecessary alerts, with limited reduction in overall order volume. This work also provides potential evidence of increase in interactions and orders (eg, an increase in signal-to-noise ratio) by decreasing noise (eg, suppression). In addition, the process built to operationalize this new ML tool may prove to be a useful model for enabling the deployment of this type of tool across many use cases. Future efforts include applying this approach globally to other EHR alerts and comprehensive randomized controlled trials.