This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Online risk calculators offer different levels of precision in their risk estimates. People interpret numbers in varying ways depending on how they are presented, and we do not know how the number of decimal places displayed might influence perceptions of risk estimates.
The objective of our study was to determine whether precision (ie, number of decimals) in risk estimates offered by an online risk calculator influences users’ ratings of (1) how believable the estimate is, (2) risk magnitude (ie, how large or small the risk feels to them), and (3) how well they can recall the risk estimate after a brief delay.
We developed two mock risk calculator websites that offered hypothetical percentage estimates of participants’ lifetime risk of kidney cancer. Participants were randomly assigned to a condition where the risk estimate value rose with increasing precision (2, 2.1, 2.13, 2.133) or the risk estimate value fell with increasing precision (2, 1.9, 1.87, 1.867). Within each group, participants were randomly assigned one of the four numbers as their first risk estimate, and later received one of the remaining three as a comparison.
Participants who completed the experiment (N = 3422) were a demographically diverse online sample, approximately representative of the US adult population on age, gender, and race. Participants whose risk estimates had no decimal places gave the highest ratings of believability (
There are subtle but measurable differences in how people interpret risk estimates of varying precision. Adding decimal places in risk calculators offers little to no benefit and some cost. Rounding to the nearest integer is likely preferable for communicating risk estimates via risk calculators so that they might be remembered correctly and judged as believable.
Risk calculators abound online. Anyone with Internet access, a Web browser, some interest in their future health, and five minutes to spare can enter a few pieces of information about themselves and receive an assessment of their risk of human immunodeficiency virus infection [
However, questions remain about how to best design risk calculators to achieve the goals of risk communication. Different calculators vary significantly in terms of their adherence to best practices for risk communication [
Robust underlying models may enable calculators to give precise risk estimates. However, it is not known whether this additional precision is helpful or harmful for people using risk calculators. In other words, we
The importance of this question becomes apparent when one considers the complex range of challenges inherent in risk communication. Across levels of education and expertise, many people, particularly those with poor numeracy, have trouble interpreting numbers in health-risk communications [
The precision of a number, in particular, can affect how people perceive and act on numerical information. For example, home buyers offer bids closer to the asking price for houses with more precise list prices [
The effects of estimate precision in health have thus far been studied by examining responses to point estimates (eg, 9%) versus ranges of estimates (eg, 5%–13%). Previous qualitative research suggested that ranges of risk estimates may be perceived as more credible than point estimates [
In this study, we aimed to isolate the effects of precision—that is, number of decimal places—on people’s interpretations of risk estimates offered by online risk calculators. We selected believability (“How believable is this number?”) and risk magnitude (“How large or small does this number feel to you?”) as primary outcomes. Perceptions of believability and risk magnitude are critical to changing health attitudes and behavior [
Participants were asked to imagine they were visiting a kidney cancer risk calculator. (See
After completing the questions in the risk calculator, participants were shown the “result” that they had been randomly assigned. They were then asked to indicate the believability of the risk, how large or small it felt to them, and a series of secondary assessments about how well or poorly the following adjectives described the estimate they were given: accurate, precise, exact, likely to be wrong, scientific, and uncertain. These secondary assessments were taken from previous work done by our research group comparing point estimates and ranges [
To mimic a plausible response to receiving a risk estimate—namely, seeking a second estimate to confirm or contradict the first—participants were then directed to a second mock calculator that presented a second risk estimate. The second estimate was randomly assigned from the other three numbers in their rising or falling group of numbers. For example, participants assigned to the falling group might receive a first risk estimate of 1.9%, and their second risk estimate would be randomly assigned as either 2%, 1.87%, or 1.867%. Participants were then asked to compare the two numbers in terms of believability, as well as the secondary outcomes of accurate, precise, exact, likely to be wrong, scientific, and uncertain. To remove the possibility that recall differences might contaminate the comparisons, all comparison questions were presented with the estimates as labels, with the first estimate as the label for the first column, and the second estimate for the second column. We did not ask participants to compare the estimates in terms of risk magnitude because we predicted that the difference in expressed values (for example, 2 > 1.9) would dominate any effects of the level of precision and we therefore saw little benefit in increasing respondent burden by adding another comparison task.
Participants completed another survey about hypothetical treatment choices for colon cancer, in which participants were cross-randomized to avoid any systematic interaction between the two surveys. They then completed a brief set of demographic and individual difference measures. Finally, on the last page of the combined survey, we asked participants to recall to the best of their ability both risk estimates they had been given. (The experimental procedure is detailed in
Flow Diagram of Experiment.
Email invitations were sent to a random sample of US adults aged 30 to 70 years, selected from a panel of Internet users administered by Survey Sampling International (Shelton, CT, USA) and stratified by gender, age, and race to ensure demographic diversity. The survey did not collect identifying information. Survey Sampling International uses a complex digital fingerprinting technique to ensure respondent uniqueness [
The
The
Secondary outcomes
To elicit
Measures of
Equation for calculating participants’ approximate recall (within 50% error) of their estimated risk:
This definition enables a wide margin of error, which we deemed appropriate for such a small risk estimate. Thus, recall estimates between approximately 1% and 3% were defined as correct approximate recall, whereas those outside the defined range were defined as incorrect.
Individual difference measures used in this study were as follows:
We analyzed ratings of believability and risk magnitude for the first risk estimates via multivariate analysis of variance (MANOVA). We used three independent variables: precision (number of decimals), direction of values (rising versus falling), and number of questions (4 screens versus 7). We included all main effects and all 2-way interactions in the model and conducted post hoc tests on precision (the only independent variable with more than two levels) via the Tukey least significant difference test. We performed a second MANOVA with the same model to examine the effects of the independent variables on the secondary outcomes accurate, precise, exact, likely to be wrong, scientific, and uncertain.
To explore participants’ assessments of the comparisons between the two risk estimates, observed differences in proportions for each measure were tested via 2-tailed binomial tests. These tests were conducted only on data from participants who judged the two numbers as different on that measure.
Finally, we analyzed exact and approximate recall via repeated measures logistic regression, regressing recall on the number of decimals. We present both exact and approximate recall results, though we focus on approximate recall as the fair and practically relevant comparison. Recalling a value expressed to more decimal places exactly requires additional memory capacity, and it is unlikely that people would need to recall estimates to a high level of precision for any practical purpose.
Data were entered and analyzed in SPSS version 16.0 (IBM Corporation, Somers, NY, USA).
Out of 4242 people who clicked the link to launch the survey, 4117 (97%) continued beyond the informed consent page, and 3422 (81%) completed the survey. All completed surveys were analyzed. Completion rates were consistent across experimental conditions. Characteristics of study participants are shown in
Study participant characteristics (N = 3422)
Characteristic | ||
|
50 (11) | |
|
||
Female | 1723 (52%) | |
Male | 1582 (48%) | |
|
||
Hispanic | 486 (14%) | |
Middle Eastern | 46 (1%) | |
|
||
White or Caucasian | 2518 (74%) | |
Black or African American | 529 (16%) | |
American Indian or Alaska Native | 55 (2%) | |
Asian or Asian American | 150 (4%) | |
Pacific Islander or Native Hawaiian | 17 (0.5%) | |
Other | 167 (5%) | |
|
||
None | 2 (0.1%) | |
Elementary school | 4 (0.1%) | |
Some high school, but no diploma | 72 (2%) | |
High school (diploma or GEDa) | 665 (19%) | |
Trade school | 186 (6%) | |
Some college, but no degree | 990 (29%) | |
Associate’s degree (AA, AS, etc) | 357 (11%) | |
Bachelor’s degree (BS, BA, etc) | 759 (22%) | |
Master’s degree (MA, MPH, etc) | 306 (9%) | |
Doctoral/professional degree (PhD, MD, etc) | 61 (2%) |
a General equivalency diploma.
The precision of the risk estimate was related to believability and perceived risk magnitude. In particular, risk estimates with zero decimals yielded the highest believability scores, with scores decreasing slightly with increasing number of decimal places (
The distribution of believability ratings is shown in
Primary outcomes
Believability: 1 = not at all, |
Risk magnitude: 0 = extremely small, |
||
|
|||
0 | 4.35 (1.24) (reference) | .21 (.24) (reference) | |
1 | 4.24 (1.23) ( |
.24 (.24) ( |
|
2 | 4.21 (1.26) ( |
.23 (.24) ( |
|
3 | 4.19 (1.22) ( |
.26 (.25) ( |
|
Overall significance |
|
|
|
|
|||
Rising | 4.24 (1.24) | .24 (.24) | |
Falling | 4.26 (1.24) | .23 (.24) | |
Overall significance |
|
|
|
|
|||
Fewer | 4.21 (1.22) | .25 (.25) | |
More | 4.28 (1.26) | .22 (.24) | |
Overall significance |
|
|
a
Distribution (n, %) of believability responses by precision (also see Table 4.1 in
Low believability | Moderate believability | High believability | ||
|
||||
0 | 63 (7%) | 353 (41%) | 450 (52%) | |
1 | 60 (8%) | 378 (47%) | 362 (45%) | |
2 | 80 (9%) | 389 (46%) | 383 (45%) | |
3 | 69 (8%) | 440 (50%) | 373 (42%) |
Estimates with one decimal point were rated as the least uncertain compared to estimates with zero (
Ratings of accuracy were higher in the condition with more questions (
Overall, none of the secondary measures suggested potential mechanisms to explain the primary findings.
Individual difference measures demonstrated strong main effects in the expected directions. (See
When comparing the first and second risk estimates they were given, large majorities of participants indicated equality across all measures (see
Comparisons of two risk estimates
Percentage of participants who chose | ||||
Which number is more | Number with |
Both numbers |
Number with |
Significance of observed proportion of |
Believablea? | 11% | 80% | 9% |
|
Accurateb? | 13% | 70% | 17% |
|
Preciseb? | 13% | 62% | 25% |
|
Exactb? | 13% | 63% | 24% |
|
Scientificb? | 11% | 69% | 20% |
|
Likely to be wrongb? | 13% | 74% | 14% |
|
Uncertainb? | 15% | 72% | 13% |
|
a Primary comparison outcome, question presented first on its own survey page.
b Secondary comparison outcomes, questions presented together on one page in random order.
After completing the questions comparing the two risk estimates, participants spent a median of 9.6 minutes (interquartile range 6.5 minutes) answering an unrelated survey before reaching the recall task, in which they were asked to recall both risk estimates they had been given earlier. Participants were not warned that they would be asked to recall the numbers.
The proportions of participants with correct recall are shown in
Participants with correct recall
Exact recall | Approximate recall | |||
Precision |
Correct | Odds ratio (95% CI) | Correct | Odds ratio (95% CI) |
0 | 93% | Reference | 96% | Reference |
1 | 83% | 0.36 (0.29–0.44) | 94% | 0.65 (0.49–0.86) |
2 | 70% | 0.17 (0.14–0.21) | 95% | 0.70 (0.53–0.94) |
3 | 43% | 0.06 (0.05–0.07) | 94% | 0.61 (0.45–0.81) |
Wald χ2
3 = 1014, |
Wald χ2
3 = 12.1, |
This study suggests that risk calculators that produce risk estimates with different levels of precision can result in different perceptions of those estimates in terms of believability and risk magnitude, as well as differences in recall. In this experiment, risk estimates with zero decimals were judged as the most believable. People may find integers somewhat more believable than numbers with decimals simply because integers are easier to understand. As evidenced by, for example, jokes and confusion about an average American family having 2.2 children, it is challenging for people to map population-based statistics onto individual circumstances. Indeed, many people, even those who are well educated, have trouble with probabilities and percentages [
Risk estimates with the least precision (zero decimals) also felt smaller on average than estimates with greater precision. This finding parallels previous research on ratio bias, in which statistical frequencies presented using smaller denominators felt smaller than those that used larger denominators [
Lower precision was also associated with better recall of the given risk estimates. It is not particularly surprising that people found numbers with more decimals places more difficult to remember perfectly. Recalling four digits takes considerably more cognitive capacity than recalling one. It is more notable that, even when allowing for a very generous margin of error in a recall task that took place shortly after the estimate was provided, there were statistically significant differences in approximate recall between estimates with zero decimals and all three estimates with decimals. This means that using decimals in a risk estimate not only reduces the chances that users will be able to recall the number exactly, but also reduces the likelihood that they will be able to remember it even approximately. This may be partially attributable to a lack of understanding about the meaning of decimals, because if people are unable to comprehend the data that they have been given, they will not be able to turn it into information that can later be recalled.
This study also suggests that the number of questions asked in a risk calculator may have an effect on perceived risk magnitude. People who completed a longer questionnaire judged the risk estimate as smaller. Although our study found no statistical effect of the number of questions in the calculator on believability, this may have been because even our version with fewer pages of questions was sufficient to be over a threshold of believability. Further research will be required to explore the effects of very brief questionnaires on people’s assessments of risk calculators, but it is worth noting that, even with a very detailed questionnaire, the estimate with zero decimals still garnered the highest believability scores.
There are three main limitations to this study. First, this experiment was based on a hypothetical scenario with artificial risk estimates all around 2%. We do not know whether similar effects would be found in situations in which numbers are real and individualized for the user, people are self-motivated to seek out the risk information, and/or numbers are larger or smaller. However, our mock risk calculator was modeled after real-world examples, and thus we have no reason to believe that patient behavior would differ when using an actual risk calculator to which he or she was directed, for example, in a routine monthly email from his or her health care group or system. We acknowledge that it is more difficult to predict how people might respond in a similar situation in which they are deliberately seeking out the information. However, conducting a controlled experiment in which the only variation was random assignment of the number of decimals in the risk estimate allowed us to control for many of the complexities of how people decide whether a piece of online health information is trustworthy, thereby isolating the unique effects of the precision of the risk estimate. Findings regarding real-world use of risk calculators will depend to some extent on users’ prior expectations regarding their risk; for example, people may be resistant to accepting risk estimates that are higher than their prior expectations [
Second, all of the statistically significant findings in this study have small effect sizes. Single-digit
Third, although our study included some secondary outcomes selected in the hopes that these might help unpack any differences found in the primary outcomes, effects on the secondary measures were largely absent. This may be partly attributable to the small effect sizes on the primary outcomes—it can be harder to unpack a small box. Nonetheless, it would be useful to better understand the mechanisms behind any differences in how people perceive risk estimates expressed as integers versus those with decimal places. Further research will be required to achieve this understanding.
To our knowledge, there has been no prior work examining the effect of decimal precision in risk estimates.
However, our finding that increased precision leads to lower believability is in line with previous qualitative research suggesting that ranges of risk estimates may be seen as more credible than point estimates [
Our finding that increased precision also leads to perception of lower risk magnitude is in contrast with previous research in which more ambiguous risk estimates, meaning those expressed as ranges rather than point estimates, led to increased risk perceptions [
Research in consumer pricing suggests that prices with decimal places may be interpreted by simply ignoring numbers after the decimal place. If this were to also occur in health risks, we would expect to observe an interaction between precision and direction on perceived risk magnitude. That is, we would expect risk magnitude scores for the rising condition (2, 2.1, 2.13, and 2.133) to remain consistent regardless of precision, while those for the falling condition (2, 1.9, 1.87, and 1.867) would decrease between the first estimate and the other 3. We did not observe any such interaction, and we suggest that this is likely because, even if this effect exists in the health context, it may be significantly smaller and thus not detectable in this study. In other words, a price of $1 may feel different from $2, but a 1% risk may not feel meaningfully smaller than a 2% risk.
Our finding that fewer decimal places leads to better recall is consistent with research reporting that health communications that provide less detail lead to higher comprehension than those that provide more detail [
There are subtle but significant differences in how people interpret risk estimates of varying precision. Increasing precision in the form of decimal places shows no clear benefit and suggests small but significant costs. Results from our experiment suggest that, in general, rounding to the nearest integer is preferable for communicating small risk estimates so that they may be judged as believable and remembered correctly. Given these findings, we recommend that risk calculator designers structure their algorithms to express risk in integers, though expressions to 1 decimal place may also be acceptable in situations when user recall of the number is not an important consideration or when greater precision is necessary to show differences between two or more numbers.
Financial support for this study was provided by a grant from the U. S. National Institutes for Health (R01 CA087595). Dr. Zikmund-Fisher is supported by a Mentored Research Scholar Grant from the American Cancer Society (MRSG-06-130-01-CPPB). The funding agreements ensured the authors’ independence in designing the study, interpreting the data, and publishing the report.
Three photos were included in the mock websites shown in Figures 1 and 2 in
The authors gratefully acknowledge Peter Ubel's comments on the early experimental design of this study.
None declared
Breast Cancer Risk Calculator Example.
Detailed Methods.
Detailed Flow Diagram.
Additional Details of Results.
multivariate analysis of variance
odds ratio