Advertisement: Preregister now for the Medicine 2.0 Congress
Can an Internet-Based Health Risk Assessment Highlight Problems of Heart Disease Risk Factor Awareness? A Cross-Sectional Analysis
Justin B Dickerson1, MBA, PhD; Catherine J McNeal2,3, MD, PhD; Ginger Tsai3; Cathleen M Rivera2,3, MD, MS; Matthew Lee Smith1,4, MPH, PhD; Robert L Ohsfeldt5, PhD; Marcia G Ory1, MPH, PhD
1School of Public Health, Department of Health Promotion and Community Health Sciences, Texas A&M Health Science Center, College Station, TX, United States
2Department of Internal Medicine, Scott & White Healthcare, Temple, TX, United States
3College of Medicine, Texas A&M Health Science Center, College Station, TX, United States
4College of Public Health, Department of Health Promotion and Behavior, The University of Georgia, Athens, GA, United States
5School of Public Health, Department of Health Policy & Management, Texas A&M Health Science Center, College Station, TX, United States
School of Public Health
Department of Health Promotion and Community Health Sciences
Texas A&M Health Science Center
College Station, TX, 77843
Phone: 1 979 777 8714
Fax: 1 503 689 8928
Background: Health risk assessments are becoming more popular as a tool to conveniently and effectively reach community-dwelling adults who may be at risk for serious chronic conditions such as coronary heart disease (CHD). The use of such instruments to improve adults’ risk factor awareness and concordance with clinically measured risk factor values could be an opportunity to advance public health knowledge and build effective interventions.
Objective: The objective of this study was to determine if an Internet-based health risk assessment can highlight important aspects of agreement between respondents’ self-reported and clinically measured CHD risk factors for community-dwelling adults who may be at risk for CHD.
Methods: Data from an Internet-based cardiovascular health risk assessment (Heart Aware) administered to community-dwelling adults at 127 clinical sites were analyzed. Respondents were recruited through individual hospital marketing campaigns, such as media advertising and print media, found throughout inpatient and outpatient facilities. CHD risk factors from the Framingham Heart Study were examined. Weighted kappa statistics were calculated to measure interrater agreement between respondents’ self-reported and clinically measured CHD risk factors. Weighted kappa statistics were then calculated for each sample by strata of overall 10-year CHD risk. Three samples were drawn based on strategies for treating missing data: a listwise deleted sample, a pairwise deleted sample, and a multiple imputation (MI) sample.
Results: The MI sample (n=16,879) was most appropriate for addressing missing data. No CHD risk factor had better than marginal interrater agreement (κ>.60). High-density lipoprotein cholesterol (HDL-C) exhibited suboptimal interrater agreement that deteriorated (eg, κ<.30) as overall CHD risk increased. Conversely, low-density lipoprotein cholesterol (LDL-C) interrater agreement improved (eg, up to κ=.25) as overall CHD risk increased. Overall CHD risk of the sample was lower than comparative population-based CHD risk (ie, no more than 15% risk of CHD for the sample vs up to a 30% chance of CHD for the population).
Conclusions: Interventions are needed to improve knowledge of CHD risk factors. Specific interventions should address perceptions of HDL-C and LCL-C. Internet-based health risk assessments such as Heart Aware may contribute to public health surveillance, but they must address selection bias of Internet-based recruitment methods.
(J Med Internet Res 2014;16(4):e106)
health risk assessment; Internet; risk factors; health disease; concordance
The Framingham Heart Study defines cardiovascular disease (CVD) as a combination of coronary heart disease (CHD), various types of stroke, peripheral artery disease, and heart failure . CVD is a leading cause of death in both males and females living in the United States . In 2006, CVD was responsible for the death of 1 in 4 Americans . CVD is one of the costliest medical conditions to treat with total economic costs estimated at more than US $300 billion annually , a cost that is also predicted to rise substantially over the next 2 decades as a result of technological improvements in care coupled with minimal reduction in the prevalence of CVD .
Lifestyle risk factors, including tobacco and alcohol use, poor diet, lack of exercise, and obesity , as well as a genetic predisposition to problems such as familial hypercholesterolemia  contribute to the high prevalence of CVD. To facilitate prevention of CVD, it is important to measure CVD risk factors on a regular basis. In an effort to reduce CVD mortality, the National Heart Lung and Blood Institute (NHLBI) launched The Heart Truth campaign to raise awareness of CVD risk factors . A year later, the American Heart Association (AHA) adopted the Red Dress symbol and launched its own campaign, Go Red for Women to emphasize the importance of knowing and reducing CVD risk factor values among at-risk females in accordance with clinical guidelines .
Patient engagement in appropriate screening and risk factor modification by health care providers is critical in preventing CVD. For example, it has been shown that a key element of dyslipidemia, a low value of high-density lipoprotein cholesterol (HDL-C), is largely an unknown CVD risk factor among the general public . Similar awareness of elements of dyslipidemia has also been shown to vary across different populations . As a result, it is important for patients being screened to be provided with education on CVD risk factors and CVD risk factor modification.
Self-report of risk factors often guides epidemiological studies of disease prevalence . As a result, it is vitally important to establish the accuracy of self-reported values against clinically measured values. Several recent studies have called the accuracy of self-reported data into question, especially for CVD risk factors [11-13]. Some studies have also produced discordant results about the accuracy of self-reported risk factor data for different socioeconomic groups and different geographic locations [14,15]. There has been a general lack of public health investigation of the agreement of self-reported and clinically measured CVD risk factors.
This study has three aims. First, it examines the degree of agreement between self-reported and clinically measured risk factors for CHD (the most prevalent CVD condition representing half of all cardiovascular diseases), including total cholesterol (TC), HDL-C, low-density lipoprotein cholesterol (LDL-C), systolic blood pressure (SBP), diastolic blood pressure (DBP), body mass index (BMI), and diabetes mellitus (DM) status, to understand which risk factors are most accurately reported by community-dwelling adults . Second, it analyzes agreement of self-reported and clinically measured CHD risk factor values according to the Framingham Heart Study’s 10-year CHD risk model  to determine if those at higher risk of CHD have a greater understanding of their CHD risk factors versus those at lower risk of CHD (ie, the resulting kappa statistics of self-reported and clinically measured agreement are stratified by Framingham 10-year CHD risk). Finally, because self-reported data often present missing data challenges, the study examines whether the method of accounting for missing data influences the results.
Heart Aware Cardiovascular Health Risk Assessment
Heart Aware is a cardiovascular health risk assessment tool offered by Navigant Consulting, Inc (Navigant Consulting, Inc, Chicago, IL, USA). The data used in this study came from Heart Aware assessments conducted at 127 clinical sites across the United States between January 1, 2006 and June 30, 2010. Research approval for the data was granted by the Institutional Review Board of Texas A&M University. The risk assessment was administered through the Internet. The assessment began by asking respondents, who voluntarily accessed the survey, a series of self-identifying demographic questions about their race, sex, and age. These respondents were recruited to the survey through marketing activities of the hospitals sponsoring the Heart Aware assessment, such as television media advertising campaigns and print media placed throughout both inpatient and outpatient facilities. Respondents were then asked to report their height, weight, whether they used tobacco, level of physical activity, DBP, SBP, TC, HDL-C, and LDL-C. Respondents were then asked to report the last time their health care provider measured their blood pressure, cholesterol, and checked their diabetes status. Finally, respondents were asked a series of questions about their family history, current medications, and current health history, with specific emphasis on CHD symptoms or diagnoses. As the respondents moved through the assessment, 2 unique tools reported data back to the respondents. First, as questions were answered by the respondents, a visual scale indicating the risk for CHD attributable to each risk factor was displayed. Scales for each risk factor were indicated on a color spectrum from green (low risk) to red (critical risk), and were updated as additional information was provided by the respondent. Second, as the respondents moved through different categories of questions, the tool provided the respondents with education about CHD and associated risk factors. This occurred in the form of text boxes on response pages. Definitions of medical terms were also provided to enhance the respondents’ knowledge of CHD and related risk factors.
The clinical sites that elected to offer the Heart Aware assessment determined the level of risk that would prompt the clinical site to extend an invitation to the respondent for a free on-site clinical risk assessment in which their self-reported CHD risk factor values would be measured by a clinician for comparison and validation. Each site set its own criteria for inviting participants to the free clinical assessment; Navigant did not document this protocol, including the method of extending the invitation to the participants. Anecdotal evidence suggests respondents were more likely to be invited for a clinical assessment if their self-reported values indicated 2 or more CHD risk factors.
Coronary Heart Disease Risk Factors
The variables analyzed as CHD risk factors were collected from the Heart Aware risk factor assessment. These included self-reported and clinically measured values of TC, HDL-C, LDL-C, SBP, DBP, BMI, as well as DM status and tobacco use. These variables were reported in the clinically assessed dataset as ordinal scales based on their frequencies and clinical guideline ranges. This resulted in ordered categories (referred to as “ranges” below) within each variable as follows: TC (<160 mg/dL, 160-199 mg/dL, 200-239 mg/dL, 240-279 mg/dL, >279 mg/dL), HDL-C (<35 mg/dL, 35-44 mg/dL, 45-49 mg/dL, 50-59 mg/dL, and >59 mg/dL), LDL-C (<100 mg/dL, 100-129 mg/dL, 130-159 mg/dL, 160-189 mg/dL, and >189 mg/dL), SBP (<120 mm Hg, 120-129 mm Hg, 130-139 mm Hg, 140-159 mm Hg, 160-199 mm Hg, and >199 mm Hg), and DBP (<80 mm Hg, 80-84 mm Hg, 85-89 mm Hg, 90-99 mm Hg, 100-114 mm Hg, and >114 mm Hg). DM status and tobacco use were coded as yes or no responses.
Sample Selection Criteria
Respondents who provided a self-reported health assessment and were chosen for and participated in a free clinical assessment of their CHD risk factors were eligible for inclusion in the sample. Those with a prior history of CVD- and CHD-related procedures, such as stroke, acute myocardial infarction, abdominal aortic aneurysm, cardiac arrest, congestive heart failure, angioplasty, catheterization and stent procedures, heart bypass, and carotid procedures, were excluded from the sample to maintain the integrity of the study objective which was to evaluate how community-dwelling adults who had not received a diagnosis of CHD viewed their risk relative to their actual clinically measured risk for CHD.
Depending on the preferred strategy for addressing missing data, 3 samples were available for analysis. To maximize our understanding of the research questions in the study, all 3 samples were made available for analysis. First, analyses were conducted on the original dataset. Given default settings in Stata version 12 (Stata Corp LLP, College Station, TX, USA), this resulted in a listwise deleted sample. Second, analyses were conducted on the original dataset, but with a change in the default settings. Instead of eliminating cases missing any of the variables being analyzed (as was the case with the listwise deleted sample), cases were only removed on a variable-by-variable analysis basis. This resulted in a pairwise deleted sample. Finally, analyses were conducted on the imputed sample. Results of all analyses performed on all 3 samples were then examined.
Self-Assessment Missing Data
Self-reported health status data are known to contain substantial amounts of missing data . Participants often choose not to answer certain questions for a variety of reasons, such as lack of knowledge, time constraints in answering the survey, or a desire not to answer certain questions based on individual preferences. Missing data introduces many analytical challenges, especially relating to biased statistical estimators when making inferences from data [19,20]. To address missing data issues, a 3-step process was used to evaluate the missing data. First, data were analyzed for the degree of missing data, such as the number of missing responses across individual variables and individual cases. Second, the pattern of missing data was analyzed using the “mvpatterns” user-written command  in Stata version 12. This command allowed the researcher to determine whether the pattern of missing data was missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR) . Third, based on the result of the first 2 steps noted previously, a methodology was devised to adjust the dataset where appropriate to account for missing data where it was deemed a concern. Several techniques were evaluated for this purpose such as listwise deletion and multiple imputation. Multiple imputation was selected as the preferred method for addressing self-reported missing data because of the large number of missing responses and MAR-identified pattern of missing data discussed later in this paper. Multivariate normal imputation with 5 imputations was used to impute missing values of SBP, DBP, and TC because these variables were most closely aligned with the Framingham 10-year CHD risk model. TC was analyzed because it had more complete self-reported data than HDL-C and LDL-C. These latter variables were key considerations when analyzing missing values in the clinically measured dataset discussed subsequently. Multivariate imputation was chosen because of its ability to take advantage of all variables in the analysis to impute the selected missing variables . Five imputations were selected to balance statistical rigor with processing speed. Independent variables used to impute were selected based on the completeness of data and their theoretical association with the imputed variables. The independent variables used for imputation were age, rural/urban designation, sex, DM status, and BMI range.
The analysis of missing data for the self-reported data was vital to the integrity of the study. Because the self-reported values were used as the basis for choosing individuals for a free clinical assessment, missing data could have profound implications for how participants were selected, possibly resulting in selection bias. Because the clinical sites set their own selection criteria that were not chronicled by Navigant, it was even more important to examine whether the missing self-reported data influenced selection by the clinical sites. This test for selection bias in the clinical assessment process was conducted using the original dataset and the imputed dataset. Using both datasets individually, the analysis was done by examining differences in risk factor variable means between those individuals selected for clinical assessment and those individuals not chosen for clinical assessment. For this purpose, t tests with statistical significance determined at the alpha=.05 level were used.
Clinical Assessment Missing Data
Although the statistical challenges of missing data noted for the self-reported data also applied to the clinically measured data, there were additional complexities that necessitated a separate analysis of the clinically measured dataset. First, when analyzing clinically measured data, many of the variables used in the Framingham 10-year CHD risk model were reported as ranges by the clinical sites. These ranges corresponded to the ranges used in the Framingham 10-year CHD risk model. Second, the pattern of missing data in the clinically measured dataset was different than the self-reported dataset. As such, methodologies used to address missing data concerns in the clinically measured dataset had to recognize both unique patterns of missing data and the fact the data were now reflected in ranges unlike individual values found in the self-reported dataset. Given the MAR-identified pattern of missing data discussed later in this paper, missing data in the clinically measured dataset was imputed with an ordered logistic model versus the multivariate normal model discussed previously. The ordered logistic imputation was carried out with 5 imputations and was used to impute missing values for ranges of SBP, DBP, TC, HDL-C, and LDL-C because these variables were most closely aligned with the Framingham 10-year CHD risk model. Note the imputation of the clinically measured dataset imputed HDL-C and LDL-C in addition to TC. This was because these variables provided a more detailed view of dyslipidemia relative to TC. It was possible to do these imputations in the clinically measured dataset because there was not as much missing data relative to the self-reported dataset. Independent variables used to impute were selected based on the completeness of data and their theoretical association with the imputed variables. The independent variables used for imputation were age, rural/urban designation, sex, DM status, and BMI range.
Descriptive Statistics and Kappa Coefficients of Interrater Agreement
Descriptive statistics were conducted on each of the 3 samples. The following variables were dichotomized by sex: age, race/ethnicity, rural/urban designation, SBP, DBP, TC, HDL-C, LDL-C, DM status, and tobacco use. Continuous variables were measured with t statistics, and categorical variables were measured with chi-square statistics. Descriptive analyses were performed using Stata version 12. Test statistics were measured for statistical significance at the alpha=.05 level of statistical significance.
Weighted kappa coefficients of interrater agreement between self-reported and clinically measured CHD risk factors were calculated for males and females. A weighted kappa coefficient was used instead of an unweighted kappa coefficient to account for the degree of discordance between each self-reported and clinically measured observation based on the fact the data were categorized as ordinal ranges. The weighting procedure was performed in Stata version 12 using the “wgt(w)” extension of the “kapci” command. The weighted kappa procedure included 100 repetitions. Each coefficient was reported with its standard error and bias-corrected 95% confidence interval. According to established literature , the strength of interrater agreement of the kappa coefficient is described as poor (κ<.00), slight (κ=.00-.20), fair (κ=.21-.40), moderate (κ=.41-.60), substantial (κ=.61-.80), and almost perfect (κ=.81-1.00).
Framingham Heart Study’s 10-Year CHD Risk Model
The 10-year CHD risk model from the Framingham Heart Study  was used to calculate each respondent’s 10-year CHD risk based on their clinically measured risk factor values. The weighted kappa coefficients were then calculated for each stratum of 10-year CHD risk scores to allow for an evaluation of risk factor agreement by level of CHD risk.
Self-Reported Data, Missing Data, and Participant Selection
The Heart Aware cardiovascular health risk assessment was taken by 373,085 individuals. The clinical sites provided a clinical assessment to 22,346 (5.99%) of these individuals (note the number of individuals offered an assessment but not taking an assessment was not recorded). Among those responding to the self-reported assessment, 238,081 (63.81%) of the respondents did not answer at least 1 of the risk factor variable questions. Based on the robust information presented in the tables subsequently, extended analysis of the results is provided in the discussion section in this paper. Table 1 reports the results of the nonresponse rate for each risk factor variable in the self-reported dataset and in the imputed self-reported dataset.
[view this table]
|Table 1. Missing data from the self-reported dataset (n=373,085) and the self-reported imputed dataset (n=373,085).|
Table 2 reports the results of the test of difference in means for the risk factor variables of those selected for clinical assessment from the self-reported dataset. It should be noted the table compares the clinically measured dataset for each variable unadjusted by imputation methods (ie, each self-reported variable has its own sample size). The sample size and very limited amount of missing data for each variable can be examined in Table 3. Table 3 also reports the same type of information as Table 2, but for those selected for clinical assessment adjusted by imputation methods. It is clear from both these tables that those with clinically measured risk factors had significantly higher values (ie, poorer values) than the overall self-reported population. This confirms the recruitment strategy of the clinics to find those at high risk of CHD and invite them for a free clinical screening of their risk factors.
[view this table]
|Table 2. Self-reported and self-reported imputed versus clinically measured dataset: difference in means on critical risk factor values.|
Missing Clinical Assessment Data
Clinically measured risk factor values were reported for 22,346 individuals. This dataset consisted of clinically measured data that were expressed as ordinal (ie, ranges of clinical values) and binary data (eg, DM status and tobacco use). Among these individuals, 6423 (28.74%) of the respondents did not answer at least 1 of the risk factor variable questions.
Table 3 reports the results of the nonresponse rate for each risk factor variable in the clinically measured dataset, including the self-reported nonresponse rate for the same respondents. Table 3 also reports the results of the nonresponse rate for each risk factor variable in the imputed clinically measured dataset.
[view this table]
|Table 3. Missing data for the clinically measured dataset (n=22,346) and the clinically measured imputed dataset (n=22,346).|
Clinical Assessment Data, Descriptive Statistics: Listwise Deletion, Pairwise Deletion, and Imputed Samples
Table 4 reports the differences in means and proportions of the CHD risk factor variables by sex within the listwise deleted sample (n=5951).
Table 5 reports the differences in means and proportions of the risk factor variables by sex within the pairwise deleted sample (note the sample size varies by each risk factor variable as indicated in the table).
Table 6 reports the differences in means and proportions of the risk factor variables by sex within the imputed sample (n=16,879).
[view this table]
|Table 4. Listwise deleted sample (n=5951): differences in means and proportions by sex.|
[view this table]
|Table 5. Differences in means and proportions by sex in the pairwise deleted sample.|
[view this table]
|Table 6. Differences in means and proportions by sex for imputed sample (n=16,879).|
Weighted Kappa Agreement and Agreement by Risk Stratification
Table 7 reports the results of the weighted kappa interrater agreement analysis by sex for each CHD risk factor variable in the listwise deleted sample. In addition to the weighted kappa statistic, its standard error and bias-corrected 95% confidence interval was reported along with an estimate of the 10-year CHD risk score for each variable’s strata of clinical values.
[view this table]
|Table 7. Interrater agreement of self-reported and clinically measured Framingham 10-year CHD risk factors by risk score for the listwise deleted sample (n=5951).|
Table 8 reports the results of the weighted kappa interrater agreement analysis by sex for each CHD risk factor variable in the pairwise deleted sample. In addition to the weighted kappa statistic, its standard error and bias-corrected 95% confidence interval was reported along with an estimate of the 10-year CHD risk score for each variable’s strata of clinical values.
[view this table]
|Table 8. Interrater agreement of self-reported and clinically measured Framingham 10-year CHD risk factors by risk score for the pairwise deleted sample.|
Table 9 reports the results of the weighted kappa interrater agreement analysis by sex for each CHD risk factor variable in the imputed sample. In addition to the weighted kappa statistic, its standard error and bias-corrected 95% confidence interval was reported along with an estimate of the 10-year CHD risk score for each variable’s strata of clinical values.
[view this table]
|Table 9. Imputed sample (n=16,879): interrater agreement of self-reported and clinically measured Framingham 10-year CHD risk factors by risk score.|
When evaluating the trends of interrater agreement between self-reported and clinically measured CHD risk factors, it is important to evaluate both the baseline interrater agreement coefficients for the entire sample and the individual interrater agreement coefficients for the strata based on 10-year CHD risk. Further, it is important to examine the changes in the interrater agreement as 10-year CHD risk increases.
Although there are some noteworthy differences between the listwise deleted and pairwise deleted samples (eg, the deterioration of interrater agreement by strata for SBP and DBP as 10-year CHD risk increases among males in the listwise deleted sample but not the pairwise deleted sample), the main outcome of interest is the difference in baseline interrater agreement coefficients of the imputed sample versus the listwise and pairwise deleted samples. Overall, the baseline interrater agreement coefficient values for each risk factor in the imputed sample were markedly lower than their counterparts in the listwise and pairwise deleted samples. For example, among males in the listwise deleted sample, the interrater agreement coefficient of self-reported and clinically measured ranges of HDL-C was kappa=.49. By comparison, the same coefficient in the imputed sample was .25. This discrepancy was substantial across the risk factor values with the largest amount of missing data (ie, ranges of TC, HDL-C, and LDL-C). By comparison, the differences in interrater agreement coefficients of variables other than ranges of TC, HDL-C, and LDL-C between the listwise deleted and imputed samples were minor. For example, among females in the listwise deleted sample, the interrater agreement coefficient of self-reported and clinically measured SBP was .45. By comparison, the same coefficient in the imputed sample was .39.
As discussed previously, one of the CHD risk factors thought to be less understood by community-dwelling adults is HDL-C. It is noteworthy that both males and females with the highest 10-year risk of CHD in the imputation sample had the lowest level of interrater agreement between self-reported and clinically measured ranges of HDL-C. In fact, the level of agreement can only be characterized as slight, which is a suboptimal level of agreement. Although the difference between a 3% 10-year risk of CHD and an 8% 10-year risk of CHD may seem numerically immaterial, it should be noted these figures are derived from Framingham’s clinical risk model , which means the difference between 3% and 8% is more than twice the mortality risk of CHD in the next 10 years. Thus, the difference is clinically relevant.
Conversely, interrater agreement of self-reported and clinically measured ranges of LDL-C slightly increased in both sexes as 10-year CHD risk increased. This is consistent with the layperson hypothesis that individuals with higher risk of CHD would be more conscious of LDL-C because it is often referred to as “bad” cholesterol. This finding is also supported with recent evidence suggesting diabetes patients who recall their most recent LDL-C values are more likely to maintain optimal hemoglobin A1C values . LDL-C could simply be the metric noted by community-dwelling adults as the most important metric to gauge in order to avoid CHD and related diseases. This is certainly consistent with how patients have been conditioned to assume LDL-C is bad cholesterol and HDL-C is good cholesterol (a belief that is the subject of rigorous investigation) . If HDL-C is eventually deemed to be just as clinically important as LDL-C, a substantial public health information campaign may be necessary to inculcate this knowledge and its importance among a public much more likely to appreciate CHD risk due to LDL-C.
Sensitivity Missing Data Techniques
Upon examining the differences of interrater agreement coefficients by the approach used to address missing data, 2 things become apparent. First, ranges of SBP, DBP, and both tobacco use and DM status were not substantially different based on the approach employed to account for missing data. This was mostly because of fewer instances of missing data than other variables in the original dataset. As such, it is appropriate to use any of the 3 samples to establish findings about interrater agreement relative to these variables in the study. However, given the significant amount of missing data for ranges of TC, HDL-C, and LDL-C, the multiple imputation strategy resulted in more conservative results of interrater agreement than the listwise and pairwise deleted samples. As such, the researcher is cautioned to use these figures when establishing findings from the study. Because of these facts, the multiple imputation sample was deemed the most appropriate for discussing findings of this study. This is because the imputation sample was conservative on the variables with greatest instances of missing data, but consistent with the other 2 methodologies for the variables with fewer instances of missing data.
Sample Versus Population-Based Coronary Heart Disease Risk Data
Figure 1 illustrates the comparative 10-year CHD risk score as established by the Framingham Heart Study  for the general male population by age group. The results from the listwise deleted and imputed samples, respectively, are also shown for comparison. Figure 2 illustrates the comparative 10-year CHD risk score as established by the Framingham Heart Study  for the general female population by age group. The results from the listwise deleted and imputed samples, respectively, are also shown for comparison.
[view this figure]
|Figure 1. Comparison of CHD risk by sample type relative to the overall population for males.|
[view this figure]
|Figure 2. Comparison of CHD risk by sample type relative to the overall population for females.|
Heart Aware May Underestimate Population Risk for Coronary Heart Disease
The most significant finding from this study is the fact a community-based health risk assessment for heart disease that is delivered via the Internet (Heart Aware) yields a sample with markedly lower risk of CHD than suggested by population health data. For example, Figure 1 demonstrates that despite the method used to account for missing data, those males participating in the Heart Aware assessment had, on average, a 10-year CHD risk that was up to 19 percentage points lower than their counterparts of the same age. Likewise, Figure 2 demonstrates that despite the method used to account for missing data, those females participating in the Health Aware assessment had, on average, a 10-year CHD risk that was up to 7 percentage points lower than their counterparts of the same age.
It should be noted this problem is compounded by the fact participants selected for clinical evaluation in this study were hand-picked by the individual hospitals based on their perceived high risk of CHD (ie, a random selection of community-dwelling adults for clinical measurement of CHD risk factors would likely result in samples with lower CHD risk, thereby exacerbating the differences between the risk of samples established by the Heart Aware assessment versus population health data), recognizing that anyone with diagnosed CHD was excluded from the study.
There are several reasons that could explain these discrepancies. The Heart Aware assessments were offered almost entirely via the Internet. This probably resulted in biased selection of participants because those using the Internet are generally more technically savvy, have higher levels of education and income, and are comparatively healthier than non-Internet users . As such, it is not surprising the tool procured a lower-risk population that is not representative of the general population. This raises a very important issue about health risk assessments such as Heart Aware. Obviously, recruitment cost is greatly reduced by using the Internet, especially through a hospital’s existing Internet presence. However, if the data are to be used for public health purposes, how can the data be more representative of the population? One approach could be to expand the methods used to collect the same data, such as using in-clinic kiosks to collect data versus relying on a participant having Internet resources available at home. If financial resources were available, the instrument could be made available through a random-digit dial survey. This method has been shown to improve validity in other CVD-related studies . Finally, a simple and cost-effective method could be to use propensity score matching to create appropriate comparison groups for analysis. This approach is also common in CVD-related studies . However, in the case of the current Heart Aware survey, additional variables need to be collected to account for the underlying demographic differences that are likely associated with the Internet selection bias discussed previously (eg, income, education level, and current health care utilization).
It is difficult to understand the influence of the selection issue on the study results. Although we know the samples in this study were of individuals of much lower CHD risk relative to population health data, and we did exclude those who had already been diagnosed with CHD, the relationship between levels of CHD risk and knowledge of CHD risk factors has yet to be firmly established. In fact, in this study, differences in the interrater agreement of self-reported and clinically measured CHD risk factors varied by sex, individual risk factor, and overall 10-year CHD risk stratum.
Sex-Based Differences Were Not Apparent
As noted previously, the Red Dress symbol and Go Red for Women campaign have been high profile efforts to highlight the fact more females of all ages die of CHD than any other cause of death . Yet, the results of this study indicate a very similar level of awareness of CHD risk factors among the sexes. It is difficult to reconcile this data with the potential success or failure of these high profile campaigns specific to females. Although there may be no marked difference in the interrater agreement between males and females on each measured CHD risk factor in the imputed sample, perhaps awareness would have been worse without the public campaigns focused on females?
What is clear about the differences in the interrater agreement of self-reported and clinically measured CHD risk factors by sex is neither sex has demonstrated a superior understanding of their CHD risk factors. Both sexes demonstrate relatively low levels of agreement on every CHD risk factor. It should also be noted females were generally “healthier” than their male counterparts in this study (see Table 6). It seems rational such a difference would influence CHD risk factor agreement by sex to a greater degree than witnessed in this study. This is an area for continued exploration because it is central to public health policy.
In addition to the noteworthy findings of this study, there are several limitations. First, the most substantial limitation is a challenge to internal validity of the results based on a substantial amount of selection bias that was likely the result of the recruitment method (ie, offering the survey to any interested party through media advertising). However, as Guba  reminds us in a classic work on naturalistic studies, the process of determining validity is not comparable with rationalistic designs such as randomized controlled trials. Naturalistic trials have a wide array of tools to complement the rationalistic approach to establishing comparable levels of study integrity and quality . Among these methods are techniques such as triangulation of results, replication, and comprehensive descriptive statistics to ensure a thorough understanding of the sample . If naturalistic designs are fundamentally characterized as research conducted in natural settings versus structured environments such as laboratories , then Heart Aware should qualify as a tool used in a naturalistic setting. Because of this paradigm, the study does accomplish some of the processes desired by naturalistic researchers, such as the exhaustive approach to examining missing data, the use of multiple imputation to ensure replication of the results displayed in the imputed sample, and the 100 repetitions conducted on each weighted kappa analysis of interrater agreement between self-reported and clinically measured CHD risk factors. Nevertheless, future research in this area should incorporate some of the suggestions made previously to counteract the apparent selection bias of solely using the Internet for recruitment.
The second limitation is the amount of missing data. Although this study has attempted to mitigate this point with multiple approaches, none of these efforts can fully account for the bias that exists in statistical estimation as a result of missing responses. At a very basic level, the latent traits of missing responses remain unknown even with the most sophisticated missing data techniques. However, it should be noted repetition and replication (as noted previously) somewhat mitigate these biases.
The third limitation of the study is the difference between sexes for baseline health behaviors and clinical values. Although some of the clinical value differences are because of normal differences based on sex, some of the discrepancies are very large indicating females are probably healthier than their male counterparts. This influences the ability to fully understand results of the study by sex.
The fourth limitation of the study is the lack of research on how respondents acquired information about their self-reported risk factor values. There could be an element of self-education or access to professional resources that play a role in the findings of the study.
Finally, it was not possible to exclude individuals from the study who had undiagnosed CHD. The sample likely contained some of these individuals and could have contributed to selection bias concerns.
The findings from this study have a unique place in the literature based on the large sample size, breadth of heart disease self-reported risk factors collected, and the method of data collection (ie, the Internet). However, this was a cross-sectional study that lacks the internal validity of a stronger design such as a randomized controlled trial. Future efforts in this field would benefit from a prospective randomized study design to ensure some of the self-selection biases and other limitations of this study are appropriately addressed.
This study sought to understand which CHD risk factors were best understood by community-dwelling adults who took an Internet-based CHD risk assessment (ie, Heart Aware). It also sought to examine whether such levels of understanding were associated with varying degrees of 10-year CHD risk for each participant. What the study has shown is although all CHD risk factors had suboptimal levels of interrater agreement between self-reported and clinically measured values, the CHD risk factor with the greatest discordance was HDL-C. This is consistent with the literature noted previously. However, this study provides unique support to this finding by incorporating a thorough review of how interrater agreement coefficients change based on approaches to missing data. Because missing data are a key analytical issue in many surveillance studies , the current study provides a robust view that supports the findings of interrater agreement for HDL-C in a variety of methodological settings. Further, these findings were drawn from a very large sample across more than 100 hospitals.
Unlike prior research efforts, this study stratified interrater agreement of self-reported and clinically measured CHD risk factors by 10-year CHD risk as established by the Framingham Heart Study . This allowed the current study to make a very important contribution to the literature, the discovery that interrater agreement for HDL-C deteriorates as 10-year CHD risk increases, whereas interrater agreement for LDL-C improves as 10-year CHD risk increases. This is a powerful finding because it not only supports the literature noted previously regarding the lack of knowledge of HDL-C among community-dwelling adults, but it also shows how the same individuals also view LDL-C. This finding has substantial implications for the health literacy, social and behavioral health, and public health implementation science communities. If the evidence of HDL-C as a protective factor for CHD continues to mature, it will be vital to translate these clinical findings into actionable public health information campaigns in the community.
Several broad themes should be drawn from this study. First, tools such as Heart Aware could be a cost-effective way to collect valuable CHD risk factor data. Researchers should begin to think about leveraging such technology by partnering with private sector firms to improve public health datasets. Such efforts can only improve public health surveillance, which is positive for researchers, policymakers, private sector firms, and community-dwelling adults. However, additional recruitment methodologies should be employed (in addition to the Internet) to reduce selection bias. Second, this research confirms the continuing need to educate community-dwelling adults about the need to understand their CHD risk factors. This is especially true regarding HDL-C and LDL-C. Finally, this research raises questions about how to use stratification of CHD risk factor agreement by 10-year CHD risk as a clinical strategy. Very few differences in interrater agreement for any CHD risk factor by 10-year CHD risk were identified in this study. Clinicians may want to consider additional strategies to improve CHD risk factor knowledge among those who currently exhibit the greatest chance of a CHD event in the next 10 years.
We would like to acknowledge Navigant Consulting, Inc for their collaboration in this study.
Conflicts of Interest
Justin B Dickerson, PhD, MBA performed intermittent paid consulting services for Navigant Consulting, Inc on separate data not related to the data used for this research.
- D'Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 2008 Feb 12;117(6):743-753 [FREE Full text] [CrossRef] [Medline]
- Centers for Disease Control and Prevention. 2012. Leading Causes of Death URL: http://www.cdc.gov/nchs/fastats/lcod.htm [accessed 2013-07-09] [WebCite Cache]
- Centers for Disease Control and Prevention. 2010. Heart Disease Facts URL: http://www.cdc.gov/heartdisease/facts.htm [accessed 2013-07-09] [WebCite Cache]
- Heidenreich PA, Trogdon JG, Khavjou OA, Butler J, Dracup K, Ezekowitz MD, American Heart Association Advocacy Coordinating Committee, Stroke Council, Council on Cardiovascular RadiologyIntervention, Council on Clinical Cardiology, Council on EpidemiologyPrevention, Council on Arteriosclerosis, ThrombosisVascular Biology, Council on Cardiopulmonary, Critical Care, PerioperativeResuscitation, Council on Cardiovascular Nursing, Council on the Kidney in Cardiovascular Disease, Council on Cardiovascular SurgeryAnesthesia‚Interdisciplinary Council on Quality of CareOutcomes Research. Forecasting the future of cardiovascular disease in the United States: a policy statement from the American Heart Association. Circulation 2011 Mar 1;123(8):933-944 [FREE Full text] [CrossRef] [Medline]
- Centers for Disease Control and Prevention. 2009. Heart Disease Risk Factors URL: http://www.cdc.gov/heartdisease/risk_factors.htm [accessed 2013-07-09] [WebCite Cache]
- Broekhuizen K, van Poppel MN, Koppes LL, Kindt I, Brug J, van Mechelen W. No significant improvement of cardiovascular disease risk indicators by a lifestyle intervention in people with familial hypercholesterolemia compared to usual care: results of a randomised controlled trial. BMC Res Notes 2012;5:181 [FREE Full text] [CrossRef] [Medline]
- National Heart, Lung, and Blood Institute. 2012. The Heart Truth: A Campaign for Women about Heart Disease URL: http://www.nhlbi.nih.gov/educational/hearttruth/ [accessed 2013-07-09] [WebCite Cache]
- American Heart Association. 2012. Go Red For Women URL: http://www.goredforwomen.org/ [accessed 2013-07-09] [WebCite Cache]
- Dorner T, Fodor JG, Lawrence K, Ludvik B, Rieder A. HDL-knowledge in the lay public: results of a representative population survey. Atherosclerosis 2007 Nov;195(1):195-198. [CrossRef] [Medline]
- Tolonen H, Keil U, Ferrario M, Evans A, MONICA Project W. Prevalence, awareness and treatment of hypercholesterolaemia in 32 populations: results from the WHO MONICA Project. Int J Epidemiol 2005 Feb;34(1):181-192 [FREE Full text] [CrossRef] [Medline]
- Gorber SC, Tremblay M, Campbell N, Hardt J. The accuracy of self-reported hypertension: A systematic review and meta-analysis. CHYR 2008 Feb 01;4(1):36-62. [CrossRef]
- Taylor A, Dal Grande E, Gill T, Pickering S, Grant J, Adams R, et al. Comparing self-reported and measured high blood pressure and high cholesterol status using data from a large representative cohort study. Aust N Z J Public Health 2010 Aug;34(4):394-400. [CrossRef] [Medline]
- Ayala C, Neff LJ, Croft JB, Keenan NL, Malarcher AM, Hyduk A, et al. Prevalence of self-reported high blood pressure awareness, advice received from health professionals, and actions taken to reduce high blood pressure among US adults--Healthstyles 2002. J Clin Hypertens (Greenwich) 2005 Sep;7(9):513-519. [Medline]
- Huang PY, Buring JE, Ridker PM, Glynn RJ. Awareness, accuracy, and predictive validity of self-reported cholesterol in women. J Gen Intern Med 2007 May;22(5):606-613 [FREE Full text] [CrossRef] [Medline]
- Ahluwalia IB, Tessaro I, Rye S, Parker L. Self-reported and clinical measurement of three chronic disease risks among low-income women in West Virginia. J Womens Health (Larchmt) 2009 Nov;18(11):1857-1862. [CrossRef] [Medline]
- Roger VL, Go AS, Lloyd-Jones DM, Adams RJ, Berry JD, Brown TM, American Heart Association Statistics CommitteeStroke Statistics Subcommittee. Heart disease and stroke statistics--2011 update: a report from the American Heart Association. Circulation 2011 Feb 1;123(4):e18-e209 [FREE Full text] [CrossRef] [Medline]
- Framingham Heart Study. 2011. General Cardiovascular Disease (10-year risk) URL: http://www.framinghamheartstudy.org/risk/gencardio.html [accessed 2013-07-09] [WebCite Cache]
- Fox-Wasylyshyn SM, El-Masri MM. Handling missing data in self-report measures. Res Nurs Health 2005 Dec;28(6):488-495. [CrossRef] [Medline]
- Little R, Rubin DB. Statistical Analysis with Missing Data. Hoboken, NJ: Wiley; 2002.
- Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: Wiley; 1987.
- Weesie J. Stata. 2012. Mvpatterns (user-written software program for Stata) URL: http://www.stata.com/stb/stb61/dm91/mvpatterns.hlp [accessed 2014-04-09] [WebCite Cache]
- UCLA Institute for Digital Research and Education. 2012. Statistical Computing Seminars: Multiple Imputation in Stata, Part 2 URL: http://www.ats.ucla.edu/stat/stata/seminars/missing_data/mi_in_stata_pt2.htm [accessed 2013-07-09] [WebCite Cache]
- Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977 Mar;33(1):159-174. [Medline]
- Framingham Heart Study. 2011. 10-Year Coronary Heart Disease Risk Score URL: http://www.framinghamheartstudy.org/risk/coronary.html [accessed 2013-07-09] [WebCite Cache]
- Stark Casagrande S, Ríos Burrows N, Geiss LS, Bainbridge KE, Fradkin JE, Cowie CC. Diabetes knowledge and its relationship with achieving treatment recommendations in a national sample of people with type 2 diabetes. Diabetes Care 2012 Jul;35(7):1556-1565 [FREE Full text] [CrossRef] [Medline]
- Dolgin E. Trial puts niacin--and cholesterol dogma--in the line of fire. Nat Med 2011 Jul;17(7):756. [CrossRef] [Medline]
- Wangberg SC, Andreassen HK, Prokosch HU, Santana SM, Sørensen T, Chronaki CE. Relations between Internet use, socio-economic status (SES), social support and subjective health. Health Promot Int 2008 Mar;23(1):70-77 [FREE Full text] [CrossRef] [Medline]
- Kargman DE, Sacco RL, Boden-Albala B, Paik MC, Hauser WA, Shea S. Validity of telephone interview data for vascular disease risk factors in a racially mixed urban community: the Northern Manhattan Stroke Study. Neuroepidemiology 1999;18(4):174-184. [CrossRef] [Medline]
- Cao L, Young N, Liu H, Silvestry S, Sun W, Zhao N, et al. Preoperative aspirin use and outcomes in cardiac surgery patients. Ann Surg 2012 Feb;255(2):399-404. [CrossRef] [Medline]
- Centers for Disease Control and Prevention. 2007. Leading Causes of Death in Females URL: http://www.cdc.gov/women/lcod/ [accessed 2013-07-09] [WebCite Cache]
- Guba EG. Criteria for assessing the trustworthiness of naturalistic inquiries. Educational Communication and Technology Journal 1981;29(2):75-91. [CrossRef]
- Bowen GA. Naturalistic inquiry and the saturation concept: a research note. Qualitative Research 2008 Feb 01;8(1):137-152. [CrossRef]
- Buhi ER, Goodson P, Neilands TB. Out of sight, not out of mind: strategies for handling missing data. Am J Health Behav 2008;32(1):83-92. [CrossRef] [Medline]
|AHA: American Heart Association|
|BMI: body mass index|
|CHD: coronary heart disease|
|CVD: cardiovascular disease|
|DBP: diastolic blood pressure|
|DM: diabetes mellitus|
|HDL-C: high-density lipoprotein cholesterol|
|LDL-C: low-density lipoprotein cholesterol|
|MI: multiple imputation|
|NHLBI: National Heart Lung and Blood Institute|
|SBP: systolic blood pressure|
|TC: total cholesterol|
|Edited by G Eysenbach; submitted 26.09.12; peer-reviewed by F Grajales III, E Waters; comments to author 21.01.13; revised version received 18.04.13; accepted 21.08.13; published 18.04.14|
Please cite as:
Dickerson JB, McNeal CJ, Tsai G, Rivera CM, Smith ML, Ohsfeldt RL, Ory MG
Can an Internet-Based Health Risk Assessment Highlight Problems of Heart Disease Risk Factor Awareness? A Cross-Sectional Analysis
J Med Internet Res 2014;16(4):e106
END, compatible with Endnote
BibTeX, compatible with BibDesk, LaTeX
RIS, compatible with RefMan, Procite, Endnote, RefWorks
Add this article to your Mendeley library
Add this article to your CiteULike library
Copyright©Justin B Dickerson, Catherine J McNeal, Ginger Tsai, Cathleen M Rivera, Matthew Lee Smith, Robert L Ohsfeldt, Marcia G Ory. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.04.2014.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.