Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Monday, December 24 through Wednesday, December 26 inclusive. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Advertisement

Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Published on 09.05.18 in Vol 20, No 5 (2018): May

Preprints (earlier versions) of this paper are available at http://preprints.jmir.org/preprint/9160, first published Oct 10, 2017.

This paper is in the following e-collection/theme issue:

    Original Paper

    Differences in Online Consumer Ratings of Health Care Providers Across Medical, Surgical, and Allied Health Specialties: Observational Study of 212,933 Providers

    1Division of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, United States

    2Cedars-Sinai Center for Outcomes Research and Education, Cedars-Sinai Medical Center, Los Angeles, CA, United States

    3Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, United States

    4Department of Medicine, Division of Health Services Research, Cedars-Sinai Health System, Los Angeles, CA, United States

    5Department of Health Policy and Management, UCLA Fielding School of Public Health, Los Angeles, CA, United States

    Corresponding Author:

    Timothy Daskivich, MSHPM, MD

    Division of Urology

    Cedars-Sinai Medical Center

    8635 West 3rd Street, Suite 1070W

    Los Angeles, CA, 90048

    United States

    Phone: 1 310 423 4700

    Fax:1 310 423 1886

    Email:


    ABSTRACT

    Background: Health care consumers are increasingly using online ratings to select providers, but differences in the distribution of scores across specialties and skew of the data have the potential to mislead consumers about the interpretation of ratings.

    Objective: The objective of our study was to determine whether distributions of consumer ratings differ across specialties and to provide specialty-specific data to assist consumers and clinicians in interpreting ratings.

    Methods: We sampled 212,933 health care providers rated on the Healthgrades consumer ratings website, representing 29 medical specialties (n=128,678), 15 surgical specialties (n=72,531), and 6 allied health (nonmedical, nonnursing) professions (n=11,724) in the United States. We created boxplots depicting distributions and tested the normality of overall patient satisfaction scores. We then determined the specialty-specific percentile rank for scores across groupings of specialties and individual specialties.

    Results: Allied health providers had higher median overall satisfaction scores (4.5, interquartile range [IQR] 4.0-5.0) than physicians in medical specialties (4.0, IQR 3.3-4.5) and surgical specialties (4.2, IQR 3.6-4.6, P<.001). Overall satisfaction scores were highly left skewed (normal between –0.5 and 0.5) for all specialties, but skewness was greatest among allied health providers (–1.23, 95% CI –1.280 to –1.181), followed by surgical (–0.77, 95% CI –0.787 to –0.755) and medical specialties (–0.64, 95% CI –0.648 to –0.628). As a result of the skewness, the percentages of overall satisfaction scores less than 4 were only 23% for allied health, 37% for surgical specialties, and 50% for medical specialties. Percentile ranks for overall satisfaction scores varied across specialties; percentile ranks for scores of 2 (0.7%, 2.9%, 0.8%), 3 (5.8%, 16.6%, 8.1%), 4 (23.0%, 50.3%, 37.3%), and 5 (63.9%, 89.5%, 86.8%) differed for allied health, medical specialties, and surgical specialties, respectively.

    Conclusions: Online consumer ratings of health care providers are highly left skewed, fall within narrow ranges, and differ by specialty, which precludes meaningful interpretation by health care consumers. Specialty-specific percentile ranks may help consumers to more meaningfully assess online physician ratings.

    J Med Internet Res 2018;20(5):e176

    doi:10.2196/jmir.9160

    KEYWORDS



    Introduction

    Health care consumers are increasingly using commercial online consumer ratings websites to rate and select medical providers. A recent study of 600 randomly selected physicians from 3 metropolitan areas in the United States revealed that 66% of physicians had at least one rating across several popular online ratings websites, with a median of 7 reviews per physician [1]. Patients also appear to strongly trust these data. Even as early as 2012, a survey found that 59% of US adults believed that online ratings websites were “somewhat important” or “very important” in selecting a physician [2]. And perhaps more strikingly, a survey of 1000 surgical outpatients from the Mayo Clinic in Rochester, MN, USA, found that 75% of patients would choose a physician and 88% would avoid seeing a physician based on ratings data alone [3]. Payers and health systems are also now including consumer ratings in their online tools for patients, which provides tacit endorsement for the ratings’ validity in comparing doctors [4,5]. The extent of consumers’ use of online ratings suggests that these data have important implications for the use of health services and may even have downstream effects on health.

    Yet, despite the public’s strong interest and trust in online physician ratings, interpretation of numeric physician ratings is difficult due to the lack of established benchmarks for scoring and the normalization of results for meaningful interpretation [6]. The most popular online consumer ratings websites use a 5-star Likert-type scale to rate providers, often reported as an overall score and sometimes across domains of performance categories. While consumers may assume that higher scores (ie, scores of 4 and 5) indicate above-average performance, this may not be so if ratings are not normally distributed [7]. In fact, the percentile rank for a given star rating may differ drastically based on how scores are distributed, such that a seemingly high score may indicate average or even below-average performance [8]. Furthermore, it is possible that distributions of scores may differ by specialty due to the varying perceptions of performance associated with patients’ specific needs and the services provided by different specialties.

    In this study, we sought to determine how online provider consumer ratings are distributed across medical, surgical, and allied health professions and whether score distributions differ across individual specialties in the United States. To address this question, we created a novel dataset consisting of over 2.7 million reviews of approximately 830,000 providers reviewed in both the US Centers for Medicare & Medicaid Services (CMS) Physician Compare [9] and the Healthgrades online consumer rating websites [10]. Our objectives were to (1) describe the distribution of quantitative overall satisfaction scores in aggregate and across provider specialties, (2) assess whether these distributions were normal, (3) quantify how overall satisfaction scores related to percentile rank across provider specialties, and (4) provide specialty-specific lookup tables showing percentile rank by overall satisfaction score. We hypothesized that overall satisfaction scores would be strongly left skewed toward higher scores across all specialties, such that seemingly high scores would be associated with a relatively low percentile rank. Lookup tables translating overall satisfaction scores into specialty-specific percentile ranks would allow for consumer ratings data to be communicated to patients in a more meaningful and accurate manner.


    Methods

    Data Source and Participants

    We sampled online consumer reviews for providers in the United States from the Healthgrades website. Our dataset consisted of all reviews up to March 31, 2017, of 830,308 health care providers. We aggregated data at the provider level to calculate an average rating for each provider across a variety of metrics: overall satisfaction, level of trust in provider’s decisions, how well the provider explains medical conditions, how well the provider listens and answers questions, and spending the appropriate amount of time with patients. We collected data on the following office metrics: ease of scheduling urgent appointments, the office environment, staff friendliness and courteousness, and total wait time. We also captured data on the number of reviews per provider. We linked these data to demographic information publicly available on the CMS Physician Compare website [9] using national provider identification numbers to capture medical specialty, region, sex, and year of graduation from medical school. Allied health specialties were defined as health professions distinct from medicine and nursing. We excluded providers with no data on overall patient satisfaction (n=345,862); no data on primary specialty (n=11,762); fewer than 4 reviews (the median number of reviews per provider in the overall dataset; n=255,202); and providers in nursing specialties (n=4549). Our final analytic sample consisted of 212,933 providers.

    Variables

    Consumer Ratings

    The Healthgrades website asks consumers to rate providers on a 5-star Likert-type scale across the domains of patient experience listed above. Individual ratings are quantized at the ordinal level, though average ratings are reported to the 10th decimal place. Average ratings for each domain were aggregated at the provider level.

    Covariates

    We collected information on US geographical region (New England, Mid-Atlantic, East North Central, West North Central, South Atlantic, East South Atlantic, West South Central, Mountain, and Pacific), sex (male, female), and graduation year (in deciles of graduation year) using linked data from the CMS Physician Compare website.

    Statistical Analysis

    We first compared our sample characteristics across medical, surgical, and allied health specialties using chi-square analysis for categorical variables and the Wilcoxon-Mann-Whitney test for continuous variables.

    To assess whether consumer ratings scores followed a normal distribution, we created histograms showing the distribution of overall patient satisfaction scores across the medical, surgical, and allied health specialties, along with individual specialties. We then assessed the divergence from normality by determining skewness and kurtosis. Skewness is a measure of symmetry of the distribution of scores, with a negative skew indicating a preponderance of higher scores and a positive skew indicating a preponderance of lower scores; normal distributions generally have skewness values between –0.5 and 0.5. Kurtosis is a measure of the tailedness of the distribution compared with the standard normal distribution; positive kurtosis values indicate a heavier tail and a higher propensity for outliers, while negative values indicate a lighter tail. Normal distributions generally have kurtosis values around 0. We performed bootstrap resampling with 100 replicates to obtain bootstrap confidence intervals for skewness and kurtosis across groupings of specialties using the basic bootstrap method.

    We then calculated the percentile rank for overall patient satisfaction scores within individual specialties and visualized them in a scatterplot figure. We used a locally weighted scatterplot smoother to visualize percentile rank by overall patient satisfaction scores across groupings of specialties.

    We used P<.05 to denote the statistical significance of 2-sided tests. All statistical analyses were performed in R version 3.4.0 (R Foundation for Statistical Computing). The Cedars-Sinai Institutional Review Board approved this study.


    Results

    Our analytic sample comprised 212,933 providers across 29 medical specialties (n=128,678), 15 surgical specialties (n=72,531), and 6 allied health professions (n=11,724; Table 1). Most providers in our sample were male (156,556/212,933, 73.52%), were from the South region (80,751/212.933, 37.92%), and graduated from medical school after 1985 (146,246/212,933, 68.68%). More of the providers in medical specialties than in surgical specialties or allied health providers were women (P<.001). Allied health providers graduated later than those in the medical or surgical specialties (P<.001).

    Median overall satisfaction scores differed significantly by provider specialty (Figure 1). Allied health providers had higher median overall satisfaction scores (4.5, interquartile range [IQR] 4.0-5.0) than physicians in medical (4.0, IQR 3.3-4.5) and surgical specialties (4.2, IQR 3.6-4.6; P<.001). There were also significant differences in median scores across subdomains of physician metrics and office and staff performance metrics by specialty (P<.001; Table 1).

    Measures of normality also differed by provider specialty. Overall satisfaction scores were highly left skewed for all provider groups, but skewness differed by specialty (Figure 2). Allied health providers had the largest negative skewness (ie, preponderance of higher scores; –1.23, 95% CI –1.280 to –1.181), compared with physicians in the surgical specialties (–0.77, 95% CI –0.787 to –0.755) and medical specialties (–0.64, 95% CI –0.648 to –0.628). Distributions of overall satisfaction scores had variable kurtosis across specialties; allied health providers had the largest positive kurtosis (ie, heavy-tailed with more outliers; 1.30, 95% CI 1.109-1.531), compared with physicians in the surgical specialties (0.26, 95% CI 0.206-0.315) and medical specialties (–0.07, 95% CI –0.101 to –0.041).

    To communicate consumer ratings data in a way that accounts for differences in distribution of overall satisfaction scores across specialties, we calculated the percentile rank for overall satisfaction scores by provider specialty. This information allows for translation of a provider’s overall satisfaction rating into a percentile ranking compared with others in their specialty. Consistent with the left skew of the data, percentile ranks were low for seemingly high overall satisfaction scores across all specialties. Percentile rank for overall satisfaction varied across allied health, medical specialties, and surgical specialties for scores of 2 (0.7%, 2.9%, 0.8%, respectively), 3 (5.8%, 16.6%, 8.1%), 4 (23.0%, 50.3%, 37.3%), and 5 (63.9%, 89.5%, 86.8%; Figure 3). As a point of reference, if overall satisfaction scores were normally distributed, the 50th percentile would occur at a score of 3. Percentile rank for overall satisfaction scores also differed substantially by individual specialties, reflecting their variable deviation from normality (Figure 4). A Web-based tool for translating overall satisfaction ratings to specialty-specific percentile rankings is available [11].

    Table 1. Sample characteristics.
    View this table
    Figure 1. Boxplots depicting the distribution of mean overall satisfaction ratings by provider specialty. OB/GYN: obstetrics and gynecology.
    View this figure
    Figure 2. Frequency of mean overall patient satisfaction scores across medical, surgical, and allied health providers.
    View this figure
    Figure 3. Percentile rank versus mean overall patient satisfaction for allied health, medical specialties, and surgical specialties. Percentile rank associated with overall patient satisfaction was first calculated within individual specialties (eg, internal medicine, podiatry, urology) as represented by scatter dots. Lines represent the locally weighted scatterplot smoothing line smoother best fit for percentile rank among specialty groupings (ie, medical, surgical, allied health). Gray bars around lines represent 95% confidence intervals for percentile rank estimates among specialty groupings.
    View this figure
    Figure 4. Percentile rank versus mean overall patient satisfaction across individual specialties. Emerg: emergency; Gen: general; Hem/Onc: hematology and oncology; Int: interventional; OB/GYN: obstetrics and gynecology; Occ: occupational; PM&R: physical medicine and rehabilitation; Prev: preventative; Recon: reconstructive.
    View this figure

    Discussion

    Principal Findings

    Online consumer ratings of health care providers are playing an increasing role in how consumers perceive and select providers. However, since online ratings lack standardized benchmarks for assessment and because ratings are not normalized, it is unclear how consumers should interpret scores. Our study showed that overall satisfaction scores are consistently left skewed, fall within narrow ranges, and have different distributions across specialties; as a result, scores that appear high might actually be in the lowest quartile of scores, effectively misleading patients about perceived quality or experience of care. Allied health specialties tend to be the least normally distributed (6/6, 100% of specialties, either moderately or highly skewed—ie, skewness greater than –0.5), followed by surgical specialties (14/15, 93% of specialties), and medical specialties (16/29, 55% of specialties). Overall satisfaction scores also fall within narrow ranges; the average IQR spanned only 1.2 stars for medical specialties and 1.0 for allied health and surgical specialties. Median overall satisfaction scores also varied across specialties, with median values ranging from 3.4 to 4.6 for medical specialties, 3.9 to 4.6 for surgical specialties, and 4.2 to 4.9 for allied health professions.

    Deviations from normality and differences in score distributions (ie, median, IQR) across specialties have a substantial impact on how scores should be interpreted by consumers. First, since scores across all specialties were drastically left skewed, consumers should be aware that most scores are high, which falsely implies that most doctors are above average. We found that median values for overall satisfaction scores were 4.0, 4.2 and 4.5, and the 25th percentiles for overall satisfaction scores were 3.4, 3.5, and 4.0 for medical, surgical, and allied health professions, respectively. Given this information, a score of 3—which would be considered average if scores were normally distributed—would be exceedingly low in terms of percentile rank across all medical professions. Second, due to the narrow ranges of scores within professions, consumers should be aware that small differences in scores may represent large differences in percentile rank; for example, a difference of 0.5 stars among a surgical provider may indicate a quartile difference in percentile rank. Third, given the significant differences in median overall satisfaction score distributions across specialties, there may be even more granular differences in how scores should be interpreted for individual specialties. For example, a urologist with 4.6 stars would be at the 80th percentile among his or her peers, whereas a cardiothoracic surgeon with the same star rating would be only at the 50th percentile.

    In response to these findings, there are several feasible measures that could improve the interpretability of online physician consumer ratings data. First, data should be reported in a way that accounts for its consistent left skewness and nonnormality. One option would be to report the median star rating for each physician as a specialty-specific percentile rank, which would reflect the nonparametric nature of the data, would reduce the impact of outliers, and would be easily interpretable [8]. Another option would be to report the frequency of ratings falling within specialty-specific quartiles of performance, which would accomplish similar goals. Second, data should be reported in a way that accounts for varying distributions across specialties and subspecialties, since our data showed that patients have different benchmarks for scoring for different health care services and types of providers. We believe that our rubric for calculating percentile rank by average overall satisfaction score for individual specialties (available in a user-friendly, Web-based format [11]) may be a useful tool for describing these data to patients in a meaningful way.

    While consumer ratings data may seem trivial to health care providers who are often focused on hard end points related to health [12], it is important to note that health care consumers strongly trust these data and choose providers based on them [2,3]. Although studies have shown that numeric online consumer ratings are not related to quality or value of care [13,14], this has not dampened the public’s enthusiasm about their use. In fact, numerous surveys have shown that patients use online consumer ratings as the sole determinant of whether or not to see a physician in consultation over three-quarters of the time [3,15]. This underscores the need for physicians to be focused not only on technical execution of their practice but also on providing excellent customer service. If patients believe that customer service (vis-à-vis consumer ratings) is important, we as health care providers should respond by measuring it accurately, describing it meaningfully, and making it a priority in the way we practice, not by ignoring it in favor of what we feel to be more important [12,16,17]. Ultimately, measurements of quality of care and consumer ratings should be provided in tandem to help consumers understand these separate components of the patient experience [5].

    Study Limitations

    Our study has some limitations. First, it is unclear whether results from the Healthgrades website are generalizable to other consumer ratings platforms, since distributions of scores may differ from platform to platform. Second, our findings may underestimate the degree of nonnormality of physician ratings due to our exclusion of providers with few ratings, since the vast majority of physicians with 1 rating had scores of 5. We decided to exclude physicians with fewer than 4 reviews (the median number of reviews in our overall sample) to ensure that average scores were representative of multiple ratings; sensitivity analyses showed little difference between distributions when we increased the threshold for the number of reviews beyond 4. Third, we cannot account for self-rating of physicians or other practices that may be used to artificially inflate consumer ratings scores; our reported scores represent distributions that would be observed in the real-life setting. Fourth, because we did not weight individual physician ratings scores by number of reviews, our reported results describe the distribution of average scores at the physician level.

    Conclusions

    Online consumer ratings of physicians are an increasingly important factor in how patients perceive and select physicians. We found that scores were highly left skewed, fell within narrow ranges, and differed by specialty; this may mislead consumers into overestimating providers with seemingly high scores who are actually mediocre or poor when compared with peers in their specialty. We herein provide a Web-based tool for translating an overall satisfaction star rating into a percentile rank comparing the provider across others in his or her specialty, an approach that accounts for the skewness and specialty-specific differences in satisfaction scores. As online consumer ratings grow in popularity, consumers will no doubt demand more detailed forms of information regarding provider service, including comparisons within specialties such as we present here. We hope our work stimulates more research on how to convey consumer ratings data in a clear, fair way, given the degree to which this information affects health care consumers’ decisions.

    Conflicts of Interest

    None declared.

    References

    1. Lagu T, Metayer K, Moran M, Ortiz L, Priya A, Goff SL, et al. Website characteristics and physician reviews on commercial physician-rating websites. JAMA 2017 Feb 21;317(7):766. [CrossRef] [Medline]
    2. Hanauer DA, Zheng K, Singer DC, Gebremariam A, Davis MM. Public awareness, perception, and use of online physician rating sites. JAMA 2014 Feb 19;311(7):734-735. [CrossRef] [Medline]
    3. Burkle CM, Keegan MT. Popularity of internet physician rating sites and their apparent influence on patients' choices of physicians. BMC Health Serv Res 2015;15:416 [FREE Full text] [CrossRef] [Medline]
    4. Physician performance based compensation. Minnetonka, MN: UnitedHealthcare Services, Inc; 2018.   URL: https:/​/www.​uhcprovider.com/​en/​reports-quality-programs/​physician-perf-based-comp.​html?rfid=UHCOContRD [accessed 2018-04-13] [WebCite Cache]
    5. Jha A. Harvard Business Review. 2015 Oct 23. Health care providers should publish physician ratings   URL: https://hbr.org/2015/10/health-care-providers-should-publish-physician-ratings [accessed 2018-04-23] [WebCite Cache]
    6. Emmert M, Sander U, Pisch F. Eight questions about physician-rating websites: a systematic review. J Med Internet Res 2013;15(2):e24 [FREE Full text] [CrossRef] [Medline]
    7. Kadry B, Chu LF, Kadry B, Gammas D, Macario A. Analysis of 4999 online physician ratings indicates that most patients give physicians a favorable rating. J Med Internet Res 2011;13(4):e95 [FREE Full text] [CrossRef] [Medline]
    8. Sullivan GM, Artino AR. Analyzing and interpreting data from likert-type scales. J Grad Med Educ 2013 Dec;5(4):541-542 [FREE Full text] [CrossRef] [Medline]
    9. Center for Medicare and Medicaid Services. Medicare Physician Compare Website   URL: https://www.medicare.gov/physiciancompare/search.html [accessed 2018-04-23] [WebCite Cache]
    10. Healthgrades. Denver, CO: Healthgrades Operating Company, Inc; 2018.   URL: https://www.healthgrades.com/ [accessed 2018-04-10] [WebCite Cache]
    11. Compare My Doc. Los Angeles, CA: Cedars-Sinai CORE; 2017.   URL: https://www.comparemydoc.com/ [accessed 2018-04-13] [WebCite Cache]
    12. Goldman E. How doctors should respond to negative online reviews. New York, NY: Forbes Media LLC; 2013 Nov 21.   URL: https:/​/www.​forbes.com/​sites/​ericgoldman/​2013/​11/​21/​how-doctors-should-respond-to-negative-online-reviews/​ [accessed 2018-04-13] [WebCite Cache]
    13. Okike K, Peter-Bibb TK, Xie KC, Okike ON. Association between physician online rating and quality of care. J Med Internet Res 2016;18(12):e324. [CrossRef] [Medline]
    14. Daskivich TJ, Houman J, Fuller G, Black JT, Kim HL, Spiegel B. Online physician ratings fail to predict actual performance on measures of quality, value, and peer review. J Am Med Inform Assoc 2017 Sep 08;25(4):401-407. [CrossRef] [Medline]
    15. Emmert M, Meier F, Pisch F, Sander U. Physician choice making and characteristics associated with using physician-rating websites: cross-sectional study. J Med Internet Res 2013;15(8):e187 [FREE Full text] [CrossRef] [Medline]
    16. Merrell JG, Levy BH, Johnson DA. Patient assessments and online ratings of quality care: a “wake-up call” for providers. Am J Gastroenterol 2013 Nov;108(11):1676-1685. [CrossRef] [Medline]
    17. Holliday AM, Kachalia A, Meyer GS, Sequist TD. Physician and patient views on public physician rating websites: a cross-sectional study. J Gen Intern Med 2017 Jun;32(6):626-631. [CrossRef] [Medline]


    Abbreviations

    CMS: Centers for Medicare & Medicaid Services
    OB/GYN: obstetrics and gynecology
    IQR: interquartile range


    Edited by G Eysenbach; submitted 10.10.17; peer-reviewed by T Lagu, R Robinson; comments to author 23.11.17; revised version received 17.01.18; accepted 24.01.18; published 09.05.18

    ©Timothy Daskivich, Michael Luu, Benjamin Noah, Garth Fuller, Jennifer Anger, Brennan Spiegel. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 09.05.2018.

    This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.