The Detection of Opioid Misuse and Heroin Use From Paramedic Response Documentation: Machine Learning for Improved Surveillance

Background: Timely, precise, and localized surveillance of nonfatal events is needed to improve response and prevention of opioid-related problems in an evolving opioid crisis in the United States. Records of naloxone administration found in prehospital emergency medical services (EMS) data have helped estimate opioid overdose incidence, including nonhospital, field-treated cases. However, as naloxone is often used by EMS personnel in unconsciousness of unknown cause, attributing naloxone administration to opioid misuse and heroin use (OM) may misclassify events. Better methods are needed to identify OM. Objective: This study aimed to develop and test a natural language processing method that would improve identification of potential OM from paramedic documentation. Methods: First, we searched Denver Health paramedic trip reports from August 2017 to April 2018 for


Background
The more than 47,000 opioid-involved overdose deaths in 2018 in the United States [1,2] insufficiently reflect the nonfatal burden associated with prescription opioid misuse and heroin use (OM) by an estimated 10.3 million people [3]. Timely, precise, and localized surveillance of nonfatal events is needed to define medical treatment trends related to OM and improve response and prevention of overdoses and other opioid-related problems.
Timely information sources about nonfatal opioid-related events include hospitals, emergency departments (EDs) [4], and prehospital emergency medical services (EMS). Paramedics routinely encounter patients with symptoms consistent with drug overdose and administer naloxone (an effective opioid antagonist) to reverse symptoms [5]. EMS data have helped estimate opioid overdose incidence, including nonhospital, field-treated cases [6][7][8]. Frequency of naloxone administration has positively correlated with opioid and heroin overdose-related ED visits [9] and fatal opioid overdose rates [10], suggesting that naloxone administration might be a relevant proxy to monitor need for interventions.
Opioid misuse and heroin use [11] refer to illicit use and nonmedical prescription opioid use for extended periods or for experience and feelings derived from the medication [12]. Naloxone, administered by paramedics to reverse opioid-induced respiratory depression [13,14], might serve as a potential OM sentinel, particularly when OM has resulted in an opioid overdose [5,9,10]. However, as naloxone is often used by EMS personnel in unconsciousness of unknown cause, attributing naloxone administration to opioid overdose and OM may misclassify events as opioid-related. A study of EMS-administered naloxone reported poor sensitivity and low positive predictive value (PPV) for opioid overdose [15].

Objective
Better methods are needed to accurately identify opioid-related problems and trends of OM. To fill this gap, we sought to develop and test a natural language processing (NLP) method that would improve classification of OM among paramedic trip reports with documentation of naloxone administration or evidence of heroin use.

Setting
Denver Health's (DH) [16] Paramedic Division is the main provider of EMS for the city and county of Denver. Their record system adheres to the National Emergency Medical Services Information System data standard version 3.4.0 [17]. We processed the following variables for each trip report: free-text narratives, primary impressions, alcohol or drug use note, and list of medications administered by paramedics. Table 1 summarizes the 3 study phases.
The Quality Improvement Committee of DH, which is endorsed by the Colorado Multiple Institutional Review Board at the University of Colorado, Denver, determined that this work did not constitute human subjects research. Searched trip reports for keywords (ie, "naloxone," "heroin," and both combined) and reviewed charts of identified reports to assess positive predictive value

Phase 1: Assess Text String Search Approaches
Naloxone administrations have been previously used to flag potential OM resulting in opioid overdoses [5,9,10], and heroin use implies OM. To reduce the DH EMS dataset to a prescreened subset of all paramedic reports, we searched for presence of keywords naloxone (or narcan) among administered medications or heroin (or misspelled variations herion and heroine) in trip report narratives between August 1, 2017, and April 30, 2018. No opioid brand names (eg, Oxycontin or Tramadol) were used to identify opioid-related events. Trip reports that included the keywords were reviewed by 2 independent reviewers, both DH paramedics, to answer the question: "Is there narrative evidence (yes, no or unsure) of illicit opioid use or prescription OM (ie, use beyond clinical needs, for extended periods, or for experience and feelings derived from the medication)?" If unsure or when adverse events from opioids did not imply misuse, reviewers were to classify that report as negative. We hypothesized lower false-positive rates for the heroin vs naloxone methods because heroin use implies OM. To visualize trends, weekly potential OM paramedic trip report counts for each search approach were calculated. Pearson correlation coefficients (r) assessed correlation between weekly OM paramedic trip report counts by search approach and reviewer assessments.

Phase 2: Train and Test Supervised Machine Learning Classification
Trip reports with naloxone among administered medications or heroin in narratives, plus reviewer agreement regarding OM in the narrative, served as our reference standard classification for training and validation of machine learning models; trip reports without reviewer agreement were omitted (examples in Multimedia Appendix 1). We removed the blank space between words in all variables, except in narratives, to create single-text entities (ie, DenverHealth instead of Denver Health). We stemmed words and removed stop words (eg, the, a, or and). To prevent overfitting, an 80% training set and 20% test set were created. Training corpus was converted into a document term matrix (terms as columns and documents as rows) that described the frequency of terms that occurred in narratives. To classify trip reports (OM evidence: yes or no), we used NLP machine learning models available from the caret Package [18] on R version 3.4.1 (ie, random forest, k-nearest neighbors, support vector machines, and L1-regularized logistic regression). Values of hyperparameters and parameters for each model were estimated using default configurations (ie, no hyperparameter tuning), which were optimized with 3 repeats of 5-fold cross-validation and then fit to the entire training set. We assessed performance of each model by calculating PPV, negative predictive value (NPV), true-positive rates (TPRs), true-negative rates (TNRs), and areas under the receiver operating characteristic curves (AUCs), and we selected the binary classification algorithm with the highest AUC for subsequent model assessment. Details can be found in authored R code in Multimedia Appendix 2.

Phase 3: Validate Performance Measures Across Approaches
We searched for presence of the keywords naloxone (or narcan) among administered medications or heroin (or misspelled variations herion and heroine) in narratives of unseen September 2018 trip reports. Resulting trip reports were manually assessed following the same methodology as in phase 1. We then applied the machine learning classifier selected in phase 2 of the study to the reduced dataset of September 2018 trip reports. We hypothesized that machine learning models would decrease false-positive classifications of the combined naloxone and heroin search method because the algorithm would have learned and benefited from agreement in human assessments in phase 1. Reviewers' assessment was used as a reference standard to calculate PPV for each approach.

Phase 1 Findings
In total, 54,359 trip reports were filed, and 1.09% (594/54,359) indicated naloxone administration; reviewers agreed on assessment in 86.9% (516/594) of reports. Among trip reports with agreement, 56.6% (292/516) were considered to include information revealing OM. Combined results, where naloxone was administered by paramedics or heroin was mentioned in the narrative, accounted for 2.39% (1298/54,359) of trip reports. Reviewers agreed on potential OM assessment in trip reports in 89.83% (1166/1298) of these. Among trip reports with agreement, more than three-quarters (907/1166, 77.79%) included information consistent with OM.
Weekly counts of keywords mention varied by approach; Figure  1 is annotated to show periods of divergent trends between weekly sums of flagged reports and those affirmed by reviewer assessment. The naloxone approach was not consistent with reviewer assessment trends (r=0.60); the heroin and combined approaches were consistent with reviewer assessment trends (r=0.88 and r=0.90, respectively).

Phase 2 Findings
The reference standard used to train and test machine learning models included details of 1166 naloxone-and heroin-flagged trip reports with positive OM reviewer assessment in phase 1.   Table 2. The machine learning classifier produced counts closer to those from reviewer assessment (Figure 2 shows counts for weeks 36 to 39 of 2018).

Principal Findings
This study sought to better understand documentation in paramedic trip reports as a tool to support more effective nonfatal OM surveillance. Accurate detection of potential OM events in survivors of EMS runs can reflect short-term trends in OM-related events at the community and national levels. These are potential leading indicators for assessing the nonfatal magnitude of the opioid crisis in an area.
Fluctuating supplies and introduction of powerful, illicitly manufactured opioids may rapidly change local morbidity and mortality patterns [19,20]. Availability of near real-time data of opioid-related problems from the field may guide prevention and intervention efforts of emergency responders, health care providers, and public health practitioners [4]. Our methods, similar to those used to identify opioid overdose risk [21], could be applied to enhance information accuracy of EMS data for state and local public health departments, an important goal in the Centers for Disease Control and Prevention (CDC) Emergency Response Cooperative Agreement [22].
Public health agencies in the United States are seeking data sources and data-driven indicators for early warning systems to identify medical consequences of misuse of prescription and illicit opioids [23]. Our study found that naloxone administrations inaccurately identified and underestimated opioid-related paramedic trip events in Denver. This result is compatible with recent findings that naloxone administration was a poor proxy for opioid overdose [15]. Our study also found that EMS-administered naloxone did not reflect trends (rise or fall) in OM-related EMS runs assessed by chart review. By itself, EMS naloxone administration was a poor stand-alone indicator and would benefit from additional information embedded in EMS records.
As a simple alternative, the keyword heroin increased over 2.5-fold (from 63 flagged by the current standard [ie, naloxone administrations] to 171) the number of records with potential OM. This strategy flagged OM reports accurately, with only 1 false positive. Combined naloxone and heroin NLP search increased sensitivity but with substantial false positives. To improve this, we applied a machine learning algorithm that produced both higher sensitivity and specificity. This same tactic, previously employed to identify alcohol misuse in clinical notes of electronic health records [24], could be extended to include more opioid-related terms such as prescription opioid names. New studies should try to assess the effects of including records flagged by keywords such as heroin or opioid brand names in model training, testing, and validation.

Limitations
Two main limitations were present in this study. First, we used data from only 1 EMS system. Although DH paramedics adhere to a widely used data standard [17], implementation may vary between organizations. Second, calculation of the probability that cases not flagged by NLP methods were truly negative cases (NPV) was impossible as manual chart review of all trip reports would require human effort beyond our capacity.