This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
In recent years, the information environment for patients to learn about physician quality is being rapidly changed by Web-based ratings from both commercial and government efforts. However, little is known about how various types of Web-based ratings affect individuals’ choice of physicians.
The objective of this research was to measure the relative importance of Web-based quality ratings from governmental and commercial agencies on individuals’ choice of primary care physicians.
In a choice-based conjoint experiment conducted on a sample of 1000 Amazon Mechanical Turk users in October 2016, individuals were asked to choose their preferred primary care physician from pairs of physicians with different ratings in clinical and nonclinical aspects of care provided by governmental and commercial agencies.
The relative log odds of choosing a physician increases by 1.31 (95% CI 1.26-1.37;
Individuals perceive nonclinical ratings provided by commercial websites as important as clinical ratings provided by government websites when choosing a primary care physician. There are significant gender differences in how the ratings are used. More research is needed on whether patients are making the best use of different types of ratings, as well as the optimal allocation of resources in improving physician ratings from the government’s perspective.
To improve quality, foster competition, promote transparency, and help patients make informed decisions, it is critical for patients to have access to reliable information and make cognizant choices about their medical providers [
The Centers for Medicare and Medicaid Services (CMS) is the most prominent governmental agency in the United States that collects, aggregates, and reports quality measures of different aspects of medical care. Through initiatives such as Hospital Compare [
Ratings of health care providers are growing in importance and popularity [
Despite the significant differences between the types (clinical and nonclinical) and the sources (governmental and commercial agencies) of ratings, variations in their relative significance for patient choice of medical providers are not known. The purpose of this research was to fill this gap by uncovering the relative importance of these ratings in the decision-making processes of different groups of patients.
We used a primary dataset consisting of responses of 1000 individuals who were each paid 50 cents to participate in an online experiment through Amazon Mechanical Turk (AMT) in October 2016. These individuals were all master users of AMT and live in the United States. According to AMT, a user achieves a master distinction by consistently completing requests with a high degree of accuracy. Masters must continue to pass AMT’s statistical monitoring to maintain their status [
To determine how ratings on different attributes affect individuals’ evaluations of medical providers, we designed an experiment and conducted a choice-based conjoint analysis [
The combination of 2 categories (clinical and nonclinical) and 2 sources (governmental and commercial agencies) resulted in 4 different types of ratings: clinical ratings provided by a governmental agency, nonclinical ratings provided by a governmental agency, clinical ratings provided by a commercial agency, and nonclinical ratings provided by a commercial agency. In this research, we use “governmental agency” and “public agency” interchangeably. We assigned a high or low value to each type of rating, and thereby created 16 profiles of hypothetical physicians. In a 1-to-4-star rating system, to induce appropriate variation, we used 2 stars to indicate low ratings and 4 stars to indicate high ratings. Each profile represented a physician with different ratings on the 4 categories. These profiles were
In a Web-based interface, we first provided respondents with a brief tutorial on different sources and types of ratings. Specifically, we described the public agency as “the department of Health and Human Services, which is a branch of the federal government” and the commercial agency as “websites such as Yelp, RateMDs, Healthgrades, Vitals, Zocdoc, and DoctorScorecard.”
Characteristics of 949 respondents and the US population.
Variable and class | Sample, n (%) | Percentage of US populationa (%) | |||
Advanced degree | 114 (12.0) | 10.38 | −1.65c | ||
Bachelor’s degree | 381 (40.2) | 18.88 | −16.74d | ||
Associate’s degree | 100 (10.5) | 5.28 | −7.25d | ||
Some college, no degree | 231 (24.3) | 19.42 | −3.83d | ||
Trade or technical school | 30 (3.2) | 4.08 | 1.43 | ||
Graduated high school | 90 (9.5) | 29.63 | 13.59d | ||
Less than high school | 3 (0.3) | 12.33 | 11.25d | ||
150,000 or more | 30 (3.2) | 13.57 | 9.36d | ||
125,000-149,999 | 25 (2.6) | 5.42 | 3.8d | ||
100,000-124,999 | 70 (7.4) | 8.71 | 1.45 | ||
75,000-99,999 | 123 (13.0) | 12.26 | −0.66 | ||
50,000-74,999 | 227 (23.9) | 16.96 | −5.71d | ||
35,000-49,999 | 170 (17.9) | 12.92 | −4.58d | ||
25,000-34,999 | 136 (14.3) | 9.39 | −5.22d | ||
Less than 25,000 | 168 (17.7) | 20.77 | 2.33e | ||
Asian | 56 (5.9) | 5.70 | −0.27 | ||
Black | 66 (7.0) | 13.30 | 5.76d | ||
Hawaiian | 1 (0.1) | 0.20 | 0.62 | ||
Hispanic | 53 (6.0) | 17.8 | 9.84d | ||
Indian | 16 (1.7) | 1.30 | −1.06 | ||
White | 757 (79.8) | 76.90 | −2.1e | ||
Divorced | 74 (7.8) | 9.80 | 2.07e | ||
Married/Domestic partner | 471(49.6) | 51.87 | 1.38 | ||
Separated | 8 (0.8) | 2.09 | 2.69d | ||
Single/Never married | 390 (41.1) | 32.25 | −5.83d | ||
Widowed | 6 (0.6) | 5.72 | 6.75d | ||
Female | 548 (57.7) | 50.80 | −4.28d | ||
Male | 401 (42.3) | 49.20 | 4.28d | ||
Younger than 65 years | 924 (97.4) | 87.00 | −9.49d | ||
65 years and older | 25 (2.6) | 13.00 | 9.49d |
aAuthors’ analysis of characteristics of experiment participants. Demographics of US population are calculated based on the data provided by the US Census Bureau.
bThe null hypothesis that the percentage in sample is equal to that of the US population.
c
d
e
Physician profiles used in choice-based conjoint experiment. “gGvernment” indicates that a public agency provides the ratings, and “Commercial” indicates that a private organization provides the ratings. In the Web-based interface, the hypothetical physician profiles in each pair were shown side-by-side and respondents were asked to choose the physician they prefer. The sequence of the pairs and the attributes in each profile were generated randomly to ensure that the order of the presentation of rank of the attributes did not influence the respondent’s choice. The values of 2 or 4 in the table, respectively, indicate a “2” or “4” star rating in the physician profiles provided to respondents in the Web-based experiment.
Pair numbera | Government rating | Commercial rating | ||
Clinical | Nonclinical | Clinical | Nonclinical | |
One | 2; 4 | 4; 2 | 2; 4 | 4; 2 |
Two | 2; 4 | 4; 2 | 4; 2 | 4; 2 |
Three | 2; 4 | 2; 2 | 2; 2 | 4; 2 |
Four | 4; 2 | 2; 4 | 2; 4 | 4; 2 |
Five | 4; 2 | 2; 4 | 4; 2 | 4; 2 |
Six | 4; 2 | 4; 2 | 2; 4 | 4; 2 |
Seven | 2; 4 | 2; 4 | 4; 2 | 4; 2 |
Eight | 2; 4 | 2; 4 | 2; 4 | 2; 4 |
Screenshot of the choice-based conjoint experiment.
We also distinguished clinical and nonclinical ratings and explained to the survey respondents that clinical ratings by the public agency were determined “based on official statistics on how often physicians provide care that research shows leads to the best results for patients” and nonclinical ratings by the public agency were determined based on “a national survey that asks patients about their experiences with staff, nurses, and doctors during a recent visit to the doctor.” Similarly, we explained that clinical ratings provided by the commercial agency were determined by “the patient online reviews about how patients evaluate the medical expertise of the doctor” and nonclinical ratings provided by the commercial agency were created based on “patient online reviews about their experiences with staff, nurses, and doctors during a recent visit to the doctor.” To assess if respondents correctly distinguished the differences between the types and the sources of ratings, at the end of the survey, we asked them to describe each type of the ratings in their own words. Our examination of their responses confirmed that all respondents had fully understood different ratings.
We then presented the 8 pairs of hypothetical profiles of physicians in a random sequence and asked respondents to choose the physician they prefer in each pair. A screenshot of 1 of the 8 comparison pairs is presented in
Once respondents finished the evaluation of physicians in the 8 pairs, we asked them a series of questions designed to evaluate their health status, medical literacy, trust in Web-based reviews, and trust in government as 4 composite indexes. We conducted factor analysis to operationalize these 4 constructs using validated items that we derived from prior literature in information systems [
One potential concern with the study design was that respondents may not complete the choice task thoughtfully. To detect and filter the responses that were provided hastily and without careful attention, we included 2 trap questions in the experiment.
The first trap question was the choice of physicians in the eighth pair (shown in
Our research design fit the multinomial logit model with clustered error terms [
On the basis of the answers to the 2 trap questions, we excluded 51 observations from our initial sample of 1000 responses. We retained the remaining 949 responses for further analysis (
As shown in the last (full model) column of
One standard deviation improvement in a patient’s health status increased the relative log odds of choosing a physician with 4 stars in commercial nonclinical ratings by 0.18 (95% CI 0.13-0.24;
Medical literacy had no statistically significant effect on how patients evaluated different types of ratings. As the level of trust in overall Web-based ratings increased, the importance of nonclinical ratings provided by a commercial agency also increased. One standard deviation increase in a patient’s trust in Web-based reviews increased the relative log odds of choosing a physician with 4 stars in nonclinical commercial ratings by 0.07 (95% CI 0.02-0.13;
One standard deviation increase in a patient’s trust in government increased the relative log odds of choosing a physician with 4 stars in government clinical ratings by 0.20 (95% CI 0.15-0.25;
The relative importance of different types and sources of ratings on patients’ choice. GC: clinical ratings provided by a public agency (government). GNC: nonclinical ratings provided by a public agency (government). YC: clinical ratings provided by a commercial agency (commercial). YNC: nonclinical ratings provided by a commercial agency (commercial).
Parametera | Parameter estimate (95% CI) | |||||
Basic model | Health status | Medical literacy | Trust in online reviews | Trust in government | Full model | |
GC | 1.29b |
1.29b |
1.29b |
1.29b |
1.30b |
1.31b |
GNC | 1.00b |
1.01b |
1.00b |
1.00b |
1.01b |
1.03b |
YC | 1.09b |
1.11b |
1.10b |
1.10b |
1.11b |
1.12b |
YNC | 1.29b |
1.31b |
1.29b |
1.29b |
1.30b |
1.32b |
Health status × GC | −0.13c |
−0.13c |
||||
Health status × GNC | 0.09c |
0.10c |
||||
Health status × YC | 0.05 |
0.05 |
||||
Health status × YNC | 0.17b |
0.18b |
||||
Medical literacy × GC | 0 |
0 |
||||
Medical literacy × GNC | 0 |
−0.01 |
||||
Medical literacy × YC | 0.03 |
0.03 |
||||
Medical literacy × YNC | −0.01 |
−0.01 |
||||
Online trust × GC | 0.06d |
0.06 |
||||
Online trust × GNC | 0.02 |
0.02 |
||||
Online trust × YC | −0.05 |
−0.05 |
||||
Online trust × YNC | 0.07d |
0.07d |
||||
Trust in government × GC | 0.19b |
0.20b |
||||
Trust in government × GNC | 0.01 |
0.01 |
||||
Trust in government × YC | 0.03 |
0.02 |
||||
Trust in government × YNC | −0.14b |
−0.15b |
aAuthors’ analysis of revealed choices in the choice-based conjoint analysis. Health status, medical literacy, online trust, and trust in government are composite indexes, centered around mean 0 with standard deviation of 1; 95% CI are reported in parentheses.
b
c
d
Interaction of ratings and patient characteristics. GC: clinical ratings provided by a public agency (government). GNC: nonclinical ratings provided by a public agency (government). YC: for clinical ratings provided by a commercial agency (commercial). YNC: the nonclinical ratings provided by a commercial agency (commercial).
Parameter | Parameter estimate (SE) | ||||||
Female | White | High income | High education | Married | Age | Full model | |
GC | 1.31a (0.05) | 1.24a (0.08) | 1.25a (0.05) | 1.18a (0.05) | 1.26a (0.05) | 1.15a (0.12) | 1.03a (0.14) |
GNC | 0.87a (0.05) | 1.16a (0.08) | 0.98a (0.04) | 1.03a (0.05) | 0.97a (0.04) | 1.22a (0.11) | 1.25a (0.14) |
YC | 1.15a (0.05) | 1.15a (0.08) | 1.08a (0.05) | 1.11a (0.05) | 1.07a (0.04) | 1.32a (0.11) | 1.42a (0.14) |
YNC | 1.19a (0.05) | 1.30a (0.08) | 1.24a (0.05) | 1.29a (0.05) | 1.18a (0.05) | 1.21a (0.12) | 1.13a (0.14) |
GC × Female | −0.03 (0.05) | −0.04 (0.05) | |||||
GNC × Female | 0.23a (0.05) | 0.23a (0.05) | |||||
YC × Female | −0.09 (0.05) | −0.10 (0.05) | |||||
YNC × Female | 0.18b (0.05) | 0.15b (0.05) | |||||
GC × White | 0.05 (0.09) | 0.03 (0.10) | |||||
GNC × White | −0.19c (0.09) | −0.17 (0.09) | |||||
YC × White | −0.08 (0.09) | −0.05 (0.09) | |||||
YNC × White | −0.01 (0.09) | −0.04 (0.10) | |||||
GC × Income | 0.074 (0.07) | 0.01 (0.08) | |||||
GNC × Income | 0.02 (0.07) | 0.02 (0.07) | |||||
YC × Income | 0.01 (0.07) | 0 (0.07) | |||||
YNC × Income | 0.10 (0.07) | 0.03 (0.08) | |||||
GC × Education | 0.21b (0.07) | 0.21b (0.07) | |||||
GNC × Education | −0.06 (0.07) | −0.07 (0.07) | |||||
YC × Education | −0.04 (0.07) | −0.04 (0.07) | |||||
YNC × Education | 0 (0.07) | −0.01 (0.07) | |||||
GC × Married | 0.05 (0.07) | 0.03 (0.08) | |||||
GNC × Married | 0.05 (0.07) | 0.046 (0.07) | |||||
YC × Married | 0.04 (0.07) | 0.09 (0.08) | |||||
YNC × Married | 0.22b (0.07) | 0.18c (0.08) | |||||
GC × Age | 0.003 (0.003) | 0.01 (0.01) | |||||
GNC × Age | −0.006c (0.002) | −0.006c (0.003) | |||||
YC × Age | −0.006c (0.002) | −0.006c (0.003) | |||||
YNC × Age | 0.002 (0.003) | 0 (0.003) |
a
b
c
To the best of our knowledge, this was the first research that, using a conjoint analysis, uncovered how individuals used Web-based ratings to compare and choose medical providers. We found that the clinical ratings provided by the government and the nonclinical ratings provided by a commercial agency were significantly more important for patient choice than nonclinical ratings provided by the government or clinical ratings provided by commercial agencies. We also found some differences in the importance of ratings based on the sociodemographic and health characteristics of respondents. Healthier patients paid more attention to nonclinical ratings, especially those from a commercial agency. On the other hand, for healthier patients, the importance of clinical ratings, notably those that are provided by the government, was lower. We found that female patients gave more importance to nonclinical ratings provided by both public and commercial agencies, compared with males. In comparison with other races, white respondents paid less attention to the nonclinical ratings provided by government. There was no other difference between racial groups in the importance of different types of ratings in the physician choice decision. Income did not play a role in the way respondents used the ratings in their decision. As patients get older, nonclinical ratings provided by the government and the clinical ratings provided by a commercial agency became even less important in how they evaluated medical providers.
A particular strength of this study was that we utilized a carefully controlled experimental design to observe the revealed preferences of participants rather than merely asking them to state them in response to a questioner, which could otherwise be subject to attribution or social desirability biases. Revealed preferences elicited in this experiment provided a more natural context, even when presented in hypothetical settings, and gave us greater confidence that the effects we observed within the sample were driven by the conjoint attributes rather than other unobserved factors.
One limitation of our study was that we rated the attributes of the physicians by either 2 or 4 stars, whereas in reality, the ratings usually have 5 levels, between 1 and 5 stars. We limited the ratings to only 2 levels to reduce the number of possible combinations. If we considered 5 levels for each rating, the number of possible physician profiles would have surged from 16 to 625. Respondents could not reasonably compare these many physician profiles with each other. A second limitation of this study was that, in comparison with the US general population, its sample was drawn from younger, more educated, and less affluent individuals. Although samples from AMT have been shown to respond similarly to representative samples of the US population [
There are 3 potential areas for further research. The first is to examine how familiar individuals are with the sources of information provided by governmental and commercial agencies. Although most individuals are now fairly familiar with the commercial rating websites, knowledge about the other sources of information provided through governmental websites may be limited. It would be useful to quantify the level of awareness of such information as a precursor to designing appropriate policies to inform the public. The second is to replicate this study on an international sample to investigate how individuals outside of the United States rely on different sources and types of information for choosing their primary care physicians. Finally, the relative importance of Web-based ratings in comparison with other factors such as insurance coverage, recommendations of family and friends, and proximity to patients’ residence is still unclear and could be investigated in future research.
The findings of this research have implications for policy makers and medical providers. Although the government has expended substantial resources on clinical quality ratings, our study indicates a need to also acknowledge the importance of nonclinical measures. This is consistent with the recent CMS efforts and policy recommendations [
With respect to patients’ age, we found that older patients and those who trusted government more paid more attention to government-provided ratings. This is corroborated by prior literature, which documents that citizens who trust government more are also more satisfied with government websites [
Given the recent apprehensions expressed about the quality and representativeness of ratings provided by commercial websites [
Our research shows that patients pay equal attention to both clinical and nonclinical ratings when choosing a primary care physician. To obtain information about clinical ratings, they rely more on government sources, whereas for information on nonclinical ratings, they rely more on commercial sources. Both public and private agencies expend significant resources to design metrics, collect data, calculate ratings, and report them to the public. These resources are limited and should be optimally allocated to the type of ratings that consumers appreciate and will use the most. The findings of this research highlight the importance of efforts from government agencies such as CMS to improve its reporting of nonclinical ratings. Given the importance of nonclinical ratings in patients’ decision making, we recommend that medical providers pay close attention to their nonclinical ratings on commercial websites as they represent a consequential source of customer feedback for improving the patient experience. Ultimately, the overarching objective of all rating sources must be focused on protecting patients from incorrect or misleading data, while simultaneously educating them on how best to interpret and make best use of the information presented.
How online quality ratings influence patients’ choice of medical providers: a controlled experimental survey study appendix (online-only material).
Amazon Mechanical Turk
Centers for Medicare and Medicaid Services
GG and WW are partially supported by NSF Career Award #1254021.
None declared.