Developing a Machine Learning Model to Predict Severe Chronic Obstructive Pulmonary Disease Exacerbations: Retrospective Cohort Study

Background Chronic obstructive pulmonary disease (COPD) poses a large burden on health care. Severe COPD exacerbations require emergency department visits or inpatient stays, often cause an irreversible decline in lung function and health status, and account for 90.3% of the total medical cost related to COPD. Many severe COPD exacerbations are deemed preventable with appropriate outpatient care. Current models for predicting severe COPD exacerbations lack accuracy, making it difficult to effectively target patients at high risk for preventive care management to reduce severe COPD exacerbations and improve outcomes. Objective The aim of this study is to develop a more accurate model to predict severe COPD exacerbations. Methods We examined all patients with COPD who visited the University of Washington Medicine facilities between 2011 and 2019 and identified 278 candidate features. By performing secondary analysis on 43,576 University of Washington Medicine data instances from 2011 to 2019, we created a machine learning model to predict severe COPD exacerbations in the next year for patients with COPD. Results The final model had an area under the receiver operating characteristic curve of 0.866. When using the top 9.99% (752/7529) of the patients with the largest predicted risk to set the cutoff threshold for binary classification, the model gained an accuracy of 90.33% (6801/7529), a sensitivity of 56.6% (103/182), and a specificity of 91.17% (6698/7347). Conclusions Our model provided a more accurate prediction of severe COPD exacerbations in the next year compared with prior published models. After further improvement of its performance measures (eg, by adding features extracted from clinical notes), our model could be used in a decision support tool to guide the identification of patients with COPD and at high risk for care management to improve outcomes. International Registered Report Identifier (IRRID) RR2-10.2196/13783

Acceptable range: Age: 40-122 years [102]. Number of features: 6. Laboratory test Minimum Alpha-1 antitrypsin (A1AT) level; maximum A1AT level; whether the minimum A1AT level is abnormally low; minimum arterial oxygen saturation (SaO2) level [69]; maximum arterial partial pressure of carbon dioxide (PaCO2) level; minimum PaCO2 level; maximum arterial partial pressure of oxygen (PaO2) level; minimum PaO2 level; maximum blood eosinophil count; maximum percentage of blood eosinophils; maximum blood neutrophil count [65,66]; maximum percentage of blood neutrophils; maximum C-reactive protein (CRP) level [64]; whether the maximum CRP level is abnormally high; maximum hematocrit (Hct) level; minimum Hct level [67]; whether the maximum Hct level is abnormally high; whether the minimum Hct level is abnormally low; maximum hemoglobin A1c (HbA1c) level; maximum hemoglobin (Hgb) level [68]; minimum Hgb level; whether the maximum Hgb level is abnormally high; whether the minimum Hgb level is abnormally low; whether an immunoglobulin E (IgE) test was performed; maximum total serum IgE level; whether the maximum total serum IgE level is abnormally high; maximum red blood cell count [63]; maximum white blood cell count [18,60]; no. of laboratory tests; no. of days from the last laboratory test; and no. of laboratory tests having abnormal results.
Number of features: 31.
Vital sign Maximum body mass index (BMI) [70]; the relative change of BMI = (the last logged BMI / the first logged BMI -1) × 100%; maximum diastolic blood pressure; average diastolic blood pressure; maximum heart rate [71]; average heart rate; maximum height; minimum peak expiratory flow [61]; average peak expiratory flow; minimum peripheral capillary oxygen saturation (SpO2) [28]; average SpO2; maximum respiratory rate; average respiratory rate; maximum systolic blood pressure; average systolic blood pressure; maximum temperature; average temperature; and the relative change of weight = (the last logged weight / the first logged weight -1) × 100%.

Medication
Total no. of COPD medications ordered; no. of COPD medication orders; total no. of distinct COPD medications ordered; total no. of COPD medication refills allowed; total no. of units of COPD medications ordered; no. of COPD reliever orders; total no. of medications in COPD reliever orders; total no. of distinct medications in COPD reliever orders; total no. of refills allowed for COPD relievers; total no. of units of COPD relievers ordered; no. of COPD controller orders; total no. of medications in COPD controller orders; total no. of distinct medications in COPD controller orders; total no. of refills allowed for COPD controllers; total no. of units of COPD controllers ordered; whether a nebulizer was used; no. of nebulizer medication orders; total no. of medications in nebulizer medication orders; total no. of distinct medications in nebulizer medication orders; total no. of refills allowed for nebulizer medications; total no. of units of nebulizer medications ordered; whether a spacer was used; no. of medication orders; total no. of medications ordered; total no. of distinct medications ordered; total no. of units of medications ordered; total no. of medication refills allowed; total no. of short-acting muscarinic antagonists (SAMA) ordered; total no. of refills allowed for SAMA; total no. of units of SAMA ordered; total no. of inhaled corticosteroids (ICS) ordered; total no. of refills allowed for ICS; total no. of units of ICS ordered; total no. of short-acting beta-2 agonists (SABA) ordered; total no. of refills allowed for SABA; total no. of units of SABA ordered; total no. of COPD medication categories [3,112]: • Short-term relievers: systemic corticosteroid [50]; shortacting muscarinic antagonist (SAMA) [50]; short-acting beta-2 agonist (SABA); SABA and SAMA combination; and mucolytic agent. systemic corticosteroid ordered; total no. of refills allowed for systemic corticosteroids; total no. of units of systemic corticosteroids ordered; total no. of long-acting beta-2 agonists (LABA) ordered; total no. of refills allowed for LABA; total no. of units of LABA ordered; total no. of long-acting muscarinic antagonists (LAMA) ordered; total no. of refills allowed for LAMA; total no. of units of LAMA ordered; total no. of phosphodiesterase-4 inhibitors (PDE-4) ordered; total no. of refills allowed for PDE-4; total no. of units of PDE-4 ordered; total no. of ICS and LABA combinations ordered; total no. of ICS, LABA, and LAMA combinations ordered; total no. of LABA and LAMA combinations ordered; and total no. of SABA and SAMA combinations ordered. Insurance Computed based on the end of the index year: whether the patient had any private insurance; whether the patient had any public insurance; and whether the patient was paid by oneself or a charity.
Number of features: 3.

Encounter
No. of all types of encounters; no. of major encounters for COPD [28,50]; no. of outpatient visits; no. of outpatient visits with a primary diagnosis of COPD; no. of emergency department (ED) visits; average length of stay of an ED visit; no. of ED visits related to COPD [28]; no. of inpatient stays; total length of inpatient stays; average length of an inpatient stay; no. of inpatient stays related to acute COPD exacerbation or respiratory failure; no. of encounters related to acute COPD exacerbation or respiratory failure [18,60]; no. of outpatient visits to the patient's primary care provider (PCP); no. of admissions to the intensive care unit; admission type of the most emergent encounter (elective, urgent, emergency, or trauma); admission type of the last encounter (elective, urgent, emergency, or trauma); type of the last encounter (ED visit, outpatient visit, or inpatient stay); type of the first encounter (ED visit, outpatient visit, or inpatient stay) related to COPD in the data set; length of stay of the last ED visit; no. of ED visits in the past 6 months; no. of inpatient stays in the past 6 months; and no. of major encounters for COPD in the past 6 months.
A major encounter for COPD was defined as an ED visit having a COPD diagnosis code, an inpatient stay having a COPD diagnosis code, or an outpatient visit having a primary diagnosis of COPD. All else being equal and compared with a patient with only outpatient visits with COPD as a secondary diagnosis, a patient with ≥1 major encounter for COPD is more likely to have severe COPD exacerbations in the future. Number of features: 22.
Visit status and appointment scheduling The day of the week when the last ED visit started; the last encounter's discharge disposition location (home, left against medical advice, or other non-home location); no. of times of leaving against medical advice; no. of no shows; no. of cancelled appointments; no. of days since the last inpatient stay; no. of days since the last outpatient visit; no. of days since the last outpatient visit on COPD; no. of days since the last ED visit; no. of days since the last ED visit on COPD [62]; the shortest time between making the request and the actual visit among all occurred encounters; no. of days between making the request and the actual visit of the last encounter; no. of visits having same day appointments; and whether the last inpatient stay came from the ED.

Number of features: 14.
Patient's care continuity degree No. of distinct medication prescribers; no. of distinct COPD medication prescribers; no. of distinct providers seen in outpatient visits; no. of distinct PCPs of the patient; and no. of distinct ED locations the patient went to (including inpatient stays admitted from the ED).
Number of features: 5.

Procedure
No. of CPT and HCPCS procedure codes; no. of ICD-10 and ICD-9 procedure codes; no. of CPT procedure codes of the fractional exhaled nitric oxide test; no. of CPT procedure codes of spirometry; no. of HCPCS procedure codes of home oxygen therapy; no. of CPT and HCPCS procedure codes of influenza vaccination; and whether mechanical ventilation was recorded using ICD-10 and ICD-9 procedure codes. Whether the patient was last recorded as a current smoker; whether the patient was last recorded as a former smoker; the last recorded no. of packs of cigarettes the patient consumed per day; the average no. of packs of cigarettes the patient consumed per day across all of the records; no. of years the patient had smoked for based on the last record; whether the patient was ever documented of consuming alcohol; whether the patient consumed alcohol based on the last record; the last recorded no. of fluid ounces of alcohol the patient consumed per week; the average no. of fluid ounces of alcohol the patient consumed per week across all of the records; the last recorded no. of alcohol drinks the patient consumed per week; the average no. of alcohol drinks the patient consumed per week across all of the records; whether the patient took any illicit drug based on the last record; whether the patient was ever documented of taking any illicit drug; the last recorded no. of times the patient took illicit drugs per week; and the average no. of times the patient took illicit drugs per week across all of the records.    We used the xgb.save() function in the xgboost package of R to save our final XGBoost model to a file in binary format. This file is available at http://faculty.washington.edu/luogang/COPD_care_model_UW.