Comparing Precision Machine Learning With Consumer, Quality, and Volume Metrics for Ranking Orthopedic Surgery Hospitals: Retrospective Study

Background: Patients’ choices of providers when undergoing elective surgeries significantly impact both perioperative outcomes and costs. There exist a variety of approaches that are available to patients for evaluating between different hospital choices. Objective: This paper aims to compare differences in outcomes and costs between hospitals ranked using popular internet-based consumer ratings, quality stars, reputation rankings, average volumes, average outcomes, and precision machine learning–based rankings for hospital settings performing hip replacements in a large metropolitan area. Methods: Retrospective data from 4192 hip replacement surgeries among Medicare beneficiaries in 2018 in a the Chicago metropolitan area were analyzed for variations in outcomes (90-day postprocedure hospitalizations and emergency department visits) and costs (90-day total cost of care) between hospitals ranked through multiple approaches: internet-based consumer ratings, quality stars, reputation rankings, average yearly surgical volume, average outcome rates, and machine learning–based rankings. The average rates of outcomes and costs were compared between the patients who underwent surgery at a hospital using each ranking approach in unadjusted and propensity-based adjusted comparisons. Results: Only a minority of patients (1159/4192, 27.6% to 2078/4192, 49.6%) were found to be matched to higher-ranked hospitals for each of the different approaches. Of the approaches considered, hip replacements at hospitals that were more highly ranked by consumer ratings, quality stars, and machine learning were all consistently associated with improvements in outcomes and costs in both adjusted and unadjusted analyses. The improvement was greatest across all metrics and analyses for machine learning–based rankings. Conclusions: There may be a substantive opportunity to increase the number of patients matched to appropriate hospitals across a broad variety of ranking approaches. Elective hip replacement surgeries performed at hospitals where patients were matched based on patient-specific machine learning were associated with better outcomes and lower total costs of care. (J Med Internet Res 2020;22(12):e22765) doi: 10.2196/22765


Introduction
Patients undergoing elective surgeries often seek information at different levels of granularity when choosing providers, ranging from institutions to practices to individual physicians. It is widely established that the choice of provider significantly impacts both perioperative outcomes and costs [1][2][3][4]. There exist a variety of approaches that are available to patients for evaluating between different provider choices. These include consumer ratings, government quality ratings (eg, Centers for Medicare & Medicaid Services [CMS] stars), reputation rankings, and average volumes and outcomes [5][6][7][8][9][10][11][12][13][14][15][16][17]. Machine learning-based online tools are emerging as an alternative method of predicting provider performance that can factor patient-specific characteristics into provider rankings [17,18]. Unfortunately, there is little prior research into the comparative performance of these methods in predicting outcomes associated with different providers [19][20][21].
This study compares different approaches (consumer ratings, CMS quality stars, reputation rankings, and average volumes and outcomes) and personalized patient-specific rankings using machine learning (referred to as precision navigation) for predicting hospital performance, as measured by postprocedure hospitalization rate, emergency department (ED) visits, and total cost of care. The analyses were performed on data from Medicare fee-for-service (FFS) beneficiaries following elective hip replacement surgery. For each of the different approaches, 90-day outcomes for patients that were treated at highly ranked institutions were measured and compared to the outcomes for patients treated at institutions that were not highly ranked. Outcomes following elective hip replacement surgeries were studied because they are among the most commonly performed surgeries and because there is considerable variation in risk-adjusted performance across hospitals. Postprocedure hospitalization, ED visits, and total cost of care were used as outcome measures because they are quantifiable, meaningful to patients, and available for Medicare FFS populations.

Study Population
The study population comprised Medicare FFS beneficiaries undergoing elective hip replacements between 2013 to 2018 in the Chicago-Naperville-Arlington Heights, Illinois metropolitan area. Hip replacement surgeries were identified among these patients using standard procedure codes available in administrative claims data.
All hospitals in the greater Chicago metropolitan area that were visited for hip replacement surgery by at least one Medicare FFS beneficiary who lived in the Chicago-Naperville-Arlington Heights, Illinois area between 2016 and 2018 were included in the analysis (thus, for all hospitals that were active between 2016 and 2018, all data from between 2013 and 2018 were used in the analysis). For each patient, up to 10 of the nearest candidate hospitals that were a maximum of 50 miles from their place of surgery were ranked. For each patient, the top-ranked choice was determined for the different approaches, and the average outcomes and costs for patients who had surgeries at a top-ranked hospital were compared with the population averages.

Ratings
For each hospital, online consumer ratings from Yelp [6], Healthgrades [22], and US News [19] and quality star ratings from CMS Hospital Compare [10] were used for ranking purposes. The 2019 ratings available at the time of conducting this study were used for each approach in the absence of historical values for these approaches. Yelp and CMS Hospital Compare included overall hospital ratings scored between 1 and 5 (rather than specialty-specific ratings). Healthgrades online reports included ratings for hospital performance specific to hip replacement procedures. US News ratings for orthopedic surgery for each hospital were used. If two or more hospitals had the same ratings, the hospital with the larger number of reviews (or patient volume in the case of CMS ratings) was considered the top-ranked hospital.

Statistics
Procedural volumes were calculated for hip replacement using Medicare FFS data for each hospital between 2013 and 2017. The average postprocedure hospitalization rate and ED admission rate at each hospital following hip replacement procedures were calculated using Medicare FFS data as a measure of quality and were also treated as a rating. For each of these ratings (procedural volume and average rates), a single average was computed from 2013 to 2017. For evaluating volume-based ratings, the top hospital was the one with the highest average yearly volume of hip replacement surgeries. For evaluating average outcome-based ratings, the top hospital was the one with the lowest combined average 90-day postprocedure hospitalization rate and ED admission rate for hip replacement surgery.

Precision Navigation
Machine learning-based rankings of orthopedic facilities were generated for each beneficiary undergoing hip replacement in 2018 using a commercially available software system (Precision Navigation; Health at Scale Corp) that uses precision navigation and was trained on data prior to 2018 (ie, from 2013-2017) [18]. The top hospital for each beneficiary was determined as the top-ranked result returned by this system based on the patient's individual characteristics. The system predicts the hospitals likely to have the best long-term outcomes for patients based on the patient's personalized medical characteristics by learning from historical data about hospitals' long-term outcomes on similar patients.

Outcomes
The 90-day postprocedure hospitalization rate and rate of ED admission following the surgery were measured as outcomes.
In addition, the 90-day total cost of care (TCOC) was also measured for each patient, which included the costs of the surgery and subsequent costs for the patient that occurred over the next 90 days from the day of surgery. The TCOC estimate included reimbursements for inpatient, outpatient, skilled nursing facility, and home health agency care, as these were available for the FFS beneficiaries. All patients included in the analyses had at least 90 days of follow-up (ie, follow-up data for patients were available until April 2019).
The observed average outcomes (postprocedure hospitalization rate, ED admission) and TCOC were compared between patients who received treatment at the top-ranked hospital for hip replacement surgery in 2018 and the overall population (all patients who had hip replacement surgery in 2018) in an unadjusted analysis. The unadjusted comparisons were also done in subgroups stratified by the Elixhauser comorbidity score. In each comparison, we statistically compared the outcomes of interest between patients who were admitted to a top-ranked hospital and patients who were admitted to a lower-ranked hospital (as determined by each respective method).
Pairwise comparisons between patients who visited a top-ranked hospital based on the different approaches were made using an adjusted propensity-based analysis with a one-to-one matching caliper without replacement [23]. Propensity was calculated using the weighted Elixhauser comorbidity score (each Elixhauser comorbidity was weighted to calculate a composite score for hospital readmissions) [24]. The propensity-matched groups for each pairwise comparison were generated 100 times by sampling with replacement. Thus, within each pairwise comparison between the various rating approaches, only patients with similar comorbidity scores (based on the matching caliper) were compared. Table 1 presents the characteristics of the Medicare FFS beneficiaries in the Chicago metropolitan area that received an inpatient hip replacement in 2018. The data consisted of 4192 hip replacement surgeries performed at 69 total hospitals. Table 2 presents descriptive summaries for consumer ratings, quality stars, reputation rankings, average yearly surgical volume, average outcome rates, and precision navigation-based ranking for hospitals in the Chicago metropolitan area. The majority of the approaches had ratings for all of the hospitals in the region (all approaches excluding US News and Healthgrades, which covered more than half but not all of the hospitals in the Chicago metropolitan area). Table 3 compares the outcomes and TCOC of patients who had surgeries at top hospitals with those of the overall population for each approach. Patients who underwent surgeries at hospitals that were ranked as top hospitals according to the CMS quality stars, Yelp, the average volume, the average outcome rate, and the precision navigation-based approaches were associated with substantially better outcomes than the population averages. This improvement in outcomes was greatest for precision navigation. The US News and Healthgrades approaches were associated with improvements in ED admission rates and postprocedure hospital rates or in TCOC but not in both. The precision navigation-based rankings were associated with an improvement of 3.4% for postprocedure hospitalization rate (P<.001), 4.1% for ED admission rates (P<.001), and US $3315 for TCOC (P<.001) between top hospitals and the overall population. The other ranking methods were associated with smaller variations in outcomes between top hospitals and the overall population. The percentage of patients who underwent surgery at a hospital ranked as a top hospital by each ratings approach varied from 27.6% (1159/4192) for the precision navigation-based rankings to 49.6% (2078/4192) for the average volume-based rankings.     Table 4 presents the average outcomes and TCOC for patients who had surgeries at top hospitals, stratified by their Elixhauser comorbidity scores. For both low (≤0) and high (>0) comorbidity score groups, patients who visited top hospitals using precision navigation-based ranking had the best outcomes and TCOC. Table 5 compares the outcomes and TCOC of patients in a pairwise, adjusted, propensity-matched analysis. For each pairwise comparison, patients matched to top hospitals using both ranking approaches were propensity matched, and the differences in average outcome rates between these propensity-matched patients were computed. In the pairwise propensity-matched comparison using the precision navigation-based approach with each of the other approaches, the improvement in outcomes was greatest for precision navigation for every outcome and every pairwise comparison.  The potential impact of using different rating systems under the counterfactual assumption that patients only presented to a top hospital for them was also considered. The precision navigation-based ranking approach resulted in significantly more hospitals being designated as a top hospital for one or more Medicare FFS beneficiary (ie, distributed or balanced the case load across a larger number of hospitals that were top ranked for specific patients). Out of 69 total hospitals visited by patients in 2018 for hip replacements, 54 were considered top-ranked hospitals for at least one patient when using precision navigation. CMS ratings resulted in 25 hospitals being top ranked. US News and Healthgrades each resulted in 27 hospitals being top ranked. Yelp-based ranking resulted in 22 hospitals being considered top ranked. Average volume and outcome resulted in 26 and 24 hospitals, respectively, being considered top-ranked hospitals for a patient.

Discussion
Prior research on the use of popular ranking and rating approaches, including web-based ratings, consumer guides, and various quality ratings for physicians or hospitals, have resulted in inconsistent findings, and it is unclear which rating approach works best [5,20,21,[25][26][27][28]. This retrospective study compares the performance of several different rating strategies for designating top hospitals for a large population of Medicare FFS beneficiaries who underwent elective hip replacement surgeries in the Chicago metropolitan area. The study also compared the performance of the aforementioned approaches for hospital rankings and ratings with a more personalized precision navigation-based approach that selects hospitals based on patients' individual health characteristics.
Overall, several approaches were shown to be associated with better outcomes and lower TCOC when patients presented to a top hospital based on the respective ranking approach. These included CMS quality stars and precision navigation-based rankings, with top-ranked hospitals achieving improved outcomes and TCOC for hip replacement using both propensity-adjusted and nonadjusted analyses. The greatest improvements were observed for precision navigation-based rankings, which were more consistently associated with reductions in 90-day postprocedure hospitalization rate, ED admission rate, and TCOC in each analysis.
Prior research has resulted in an inconsistent correlation between outcomes and ratings in top-ranked hospitals. For example, Cram et al [29] showed no significant differences in total knee arthroplasty outcomes in top-ranked and non-top-ranked hospitals using the US News rankings. Studies in other surgical subspecialties, such as cardiac surgery, found only a weak correlation between online ratings and perioperative mortality [25]. Osborn et al [27] showed a significant correlation between favorable rankings and lower mortality rates associated with various major surgical procedures. However, use of mortality as a surrogate for outcome may not reflect other pertinent outcomes that occur with higher frequency, such as postprocedure hospital admissions or ED visits.
In this study, hospital performance was measured using the rates of postprocedure hospitalizations and ED visits and the total cost of care among Medicare FFS beneficiaries undergoing elective hip replacements. These metrics, which were available for all hospitals included in this study, are meaningful to patients. Many of the popular consumer-based ranking approaches are based more on patient satisfaction than on objective measures [6,25,26,30]. Velasco et al [31] found that negative online comments about orthopedic surgeons were associated with surgery-independent factors, such as waiting time and logistics. Austin et al [32] showed that there was considerable variation in the rankings of top hospitals when different criteria were used. The significant variation in ratings across the different rating platforms may complicate the choice of institution for patients [7,8,33]. Moreover, all of these approaches implicitly assume that the choice of a top hospital is independent of the characteristics of the patient [19,32]. The results of this study call this assumption into question.
Typical methods for ranking hospitals for specialty procedures assume that the quality of an institution is the same for all patients. The personalized approach of precision navigation-based ranking predicts outcomes for each hospital-patient pair by learning from the respective hospital's prior outcomes for patients who resemble the patient being matched. Thus, this study suggests that hospital quality may be a personalized, variable phenomenon rather than a global, uniform value. The overwhelming majority of hospitals considered are not consistently ideal (ie, the top choice) for all patients but instead lie largely in a "gray zone" of being selectively good or bad for individual patients. More than 85% (80/94) of the hospitals that were included in the Chicago metropolitan area fell in this category. This finding is perhaps not surprising, given the complexity of surgical patients and the variations in case mixes, resources, training, and other organizational characteristics of hospitals that lead to facilities performing well or poorly for specific individuals. Prior research has shown that patient complexity is strongly correlated with outcomes in elective surgeries, such as total knee arthroplasty [34]. Bozic et al [35] also found considerable variation in respective patient population acuities of hospitals performing elective hip and knee replacements and a fourfold difference in risk-adjusted complication rates [35].
The potential policy implications of our study are significant. The paradigm of patients choosing institutions based on crowdsourced online ratings, popular consumer guides, and established metrics, such as high-volume centers, as surrogates for high performance may not be suitable for identifying the preferred centers for patients contemplating elective surgery.
Average outcome-based measures are more reliable than online consumer rankings but still far from optimal, and they did not result in lower TCOC. Our results suggest that a personalized approach based on precision navigation that uses readily available data to characterize a patient's medical complexity in the context of individual hospitals may be associated with substantial improvements in outcomes while also lowering TCOC. An additional policy concern is that any mechanism designed to encourage patients to go to specific hospitals should ideally balance hospital use and capacity. Prioritizing hospital choice based on the typical static rating approaches would lead to a greater concentration of surgeries in a small number of top hospitals, thus overwhelming certain hospitals while underutilizing others. The personalized approach distributed patients over more hospitals (54 hospitals) than any of the other approaches (27 or fewer hospitals). Finally, the low proportion of patients that were matched to top hospitals using the precision navigation-based approach (<30%) presents a substantial opportunity for improving outcomes and costs for many patients by steering them to hospitals that are best suited for them.
One of the limitations of this study is the incomplete characterization of each patient's unique medical needs and medical history, since only CMS administrative claims were available. Ideally, the online ratings would be based on reviews collected before the date of the procedure. However, access to historic ratings for online web-based systems was not available. Furthermore, the use of ED visits and postprocedure hospitalizations as outcomes does not reflect on the quality of life improvement following elective surgery that patients ultimately seek. This study was retrospective in nature and thus only showed associations between outcomes and various ranking and rating methods rather than suggesting causal relationships between ranking methods and patient outcomes. The ranking methods that were evaluated in this study's experimental design may not have been intended to be used as stand-alone referral services or offer advice on the suitability or choice of particular hospitals. Rather, they may have been intended to be used (where applicable) as additional information that can be factored in with other inputs (such as referral recommendations provided by patients' care providers) in choosing a hospital site for a future surgery. This study only evaluated the association of these ranking methodologies within the experimental framework considered, and the findings here should not be taken as a statement of their suitability for any purpose. Patients' choices in elective surgeries can be influenced by socioeconomic factors as well as factors beyond their control, such as the availability of surgeons, recommendations from their care providers, and the network limitations of their health insurance plans.
The present study was performed using Medicare beneficiary claims data that primarily comprised older patients, and their applicability to younger patients with private health care insurance or Medicaid requires further research. Another limitation of this work was that the analysis was restricted to the greater Chicago metropolitan area. Future work is needed to extend this study to other procedures, non-Medicare populations, and hospitals nationwide. A prospective trial is warranted to further study the impact of hospital-ranking approaches on patient outcomes and total cost of care.