Accurate assessment of the difficulty of consumer health texts is a prerequisite for improving readability. General purpose readability formulas based primarily on word length are not well suited for the health domain, where short technical terms may be unfamiliar to consumers. To address this need, we previously developed a regression model for predicting “average familiarity” with consumer health vocabulary (CHV) terms.
The primary goal was to evaluate the ability of the CHV term familiarity model to predict (1) surface-level familiarity of health-related terms and (2) understanding of the underlying meaning (concept familiarity) among actual consumers. Secondary goals involved exploring the effect of demographic factors (eg, health literacy) on surface-level and concept-level familiarity and describing the relationship between the two levels of familiarity.
Survey instruments for assessing surface-level familiarity (45 items) and concept-level familiarity (15 items) were developed. All participants also completed a demographic survey and a standardized health literacy assessment, S-TOFHLA.
Based on surveys completed by 52 consumers, linear regression suggests that predicted CHV term familiarity is a statistically significantly predictor (
This exploratory study suggests that the CHV term familiarity model is predictive of consumer recognition and understanding of terms in the health domain. Potential uses of such a model include readability formulas tailored to the consumer health domain and tools to “translate” professional medical documents into text that is more accessible to consumers. The study also highlights the usefulness of distinguishing between surface-level term familiarity and deeper concept understanding and presents one method for assessing familiarity at each level.
Improving the readability of online consumer health materials is an important area of eHealth research. Studies indicate that health information on the Web is beyond the reading ability of average consumers [
Many researchers point to the need to reduce the gap between health literacy of the readers and the readability of consumer health materials [
Recognizing the limitations of these previous approaches, we set out to explore alternative measures that account for “average” familiarity with health terms among members of a convenience sample of consumers. The ability to recognize terms is important because readers need to associate health terms with their corresponding concepts in order to extract useful information from text. Thus, we decompose health vocabulary knowledge into two parts: (1) surface-level term familiarity, or recognition of the lexical form, and (2) concept-level term familiarity, or understanding of the underlying concept. In cognitive science, a concept can be viewed as a set of slots that can be filled with characteristics describing a class of objects or events [
We had previously developed a support vector machine regression model for predicting “familiarity likelihood scores” of consumer health vocabulary (CHV) terms using the empirical data from user studies evaluating “consumer-friendly display” names for medical concepts [
The primary goal of the research reported in this paper was to develop and apply a simple methodology for validating the CHV familiarity predictive model against actual empirically derived familiarity with various health terms among health consumers. The validation is distinct and independent from the empirical data used in deriving the model. Both surface-level (ie, recognition) and concept-level familiarity (ie, understanding of the underlying meaning) data were collected from participants. Surface-level familiarity was investigated because it corresponds with existing conventional approaches to assessing health vocabulary knowledge. The goal of concept-level familiarity assessment was to explore the potential of this novel approach and to characterize the relationship between the two familiarity levels. Finally, we sought to describe the effect of demographic factors (including health literacy and education level) on actual consumers’ scores. The following three hypotheses addressed the goals of the study:
Predicted familiarity likelihood level will have a significant effect on consumer surface-level term familiarity and consumer understanding of the underlying concept.
Demographic factors, including but not limited to health and education level, will have a significant effect on both types of familiarity scores.
Consumers’ surface-level familiarity with terms will be greater than their understanding of the underlying concepts.
Consumers (n = 52) were recruited from Brigham and Women’s Hospital. Health literacy, assessed with Short Test of Functional Health Literacy in Adults (S-TOFHLA) [
Other demographic variables were self-reported using a brief questionnaire (
Demographic characteristics of the participants (n = 52)
|
|
|
|
Male | 16 |
Female | 36 |
|
|
Native speakers | 44 |
Non-native speakers | 8 |
|
|
Below high school | 2 |
High school | 9 |
Some college | 20 |
College | 13 |
Graduate school | 8 |
|
|
18-25 | 5 |
26-39 | 13 |
40-59 | 25 |
≥ 60 | 9 |
|
|
White | 25 |
Black | 13 |
Hispanic | 8 |
Other | 6 |
|
|
high health literacy (23-36) | 50 |
moderate health literacy (17-22) | 2 |
A survey for assessing CHV surface-level (45 items) and concept-level (15 items) familiarity was developed, piloted tested, and implemented as described below. The process of instrument development consisted of two stages: (1) selecting health terms for inclusion in the test and (2) developing multiple-choice items for each term (
Survey development process (T = topic; L = predicted familiarity level)
Candidate CHV terms were selected from consumer health texts for three frequently visited MedlinePlus health topics: hypertension, back pain, and gastroesophageal reflux disease (GERD). One representative article on each selected topic was chosen from among consumer health sites listed by MedlinePlus. A final-year medical student manually extracted all health-related terms from each article. Next, all extracted terms were submitted to the predictive familiarity model [
The next stage of instrument construction involved developing multiple-choice test items assessing the two types of familiarity, operationally defined as the following:
1. Surface-level familiarity: ability to match written health terms with basic relevant associated terms at the super-category, location, or function level (eg, “biopsy” is a “test”)
2. Concept-level familiarity: ability to associate written terms with brief phrases describing the meaning or “gists” (eg, “biopsy” means “removing a sample of tissue”)
Surface-level familiarity items (
The layout of all test items was modeled on the Short Assessment of Health Literacy for Spanish-Speaking Adults (SAHLSA) [
Sample CHV instrument surface-level familiarity item
Incorporating the REALM procedure, SAHLSA requires the examinee both to correctly pronounce the target term and to select the key term. However, since our goal was to measure familiarity with written health expressions and concepts explicitly using a self-administered tool (eg, via the Web), the SAHLSA requirement for examinees to pronounce each target expression was dropped. The final test included surface-level familiarity items for all three health topics (questions 1-45) and concept-level familiarity items for GERD terms only (questions 46-60). The entire instrument is available in the Multimedia Appendix.
Sample CHV instrument concept-level familiarity item
Participants first completed the demographics survey, followed by the S-TOFHLA and CHV familiarity survey (surface-level items followed by concept-level familiarity items). For scoring, each correct answer was awarded one point. Surface-level and concept-level familiarity scores were calculated separately. Regression analysis tests on the data were performed at the 0.05 level of significance. Since the study is exploratory in nature, the values between 0.05 and 0.1 are reported for descriptive purposes, as indicating trends for further investigation.
Three types of means were computed for each predicted familiarity likelihood level (“likely,” “somewhat likely,” and “unlikely” to be familiar): total surface-level familiarity, GERD surface-level familiarity, and GERD concept-level familiarity (
Mean surface-level and concept-level familiarity scores
Predicted Familiarity Likelihood | Total Surface-Level Familiarity |
GERD Surface-Level Familiarity |
GERD Concept-Level Familiarity |
Likely | 13.80 (1.97) | 4.75 (0.81) | 3.83 (1.22) |
Somewhat likely | 12.92 (2.60) | 4.54 (1.02) | 3.94 (1.04) |
Unlikely | 9.53 (3.44) | 3.42 (1.42) | 3.04 (1.31) |
Total surface-level familiarity and GERD concept-level familiarity were the dependent variables of hypotheses 1 and 2. GERD surface-level familiarity was used in computing the gap between GERD surface-level and concept-level familiarity, the dependent variable for hypothesis 3.
Seven independent variables—predicted familiarity likelihood level, gender, English proficiency, highest education level, age, race, and health literacy level (S-TOFHLA scores)—were regressed onto the dependent variable, total surface-level term familiarity score. Linear regression found a statistically significant effect (
All seven independent variables from the previous regression analysis plus GERD surface-level familiarity were regressed onto GERD concept-level familiarity score. Linear regression found statistically significant effects of predicted familiarity likelihood level (
While previous regression analysis indicated that GERD surface-level familiarity score was a significant predictor of GERD concept-level familiarity, the concept-level familiarity consistently lagged behind surface-level familiarity at all three levels (see
Although preliminary in nature, this study presents an initial evaluation of the first model for estimating consumer familiarity with health-specific terms. The findings confirmed hypotheses 1 and 3 and partially confirmed hypothesis 2. Confirmation of hypothesis 1 provided initial validity evidence for the CHV familiarity likelihood model [
Partial confirmation of hypothesis 2 and confirmation of hypothesis 3 both point to limitations of the model with respect to its ability to identify “consumer-unfriendly” words. Part of the variance in readers’ performance is likely to be related to demographic characteristics, not accounted for in the model. With further research, it is perhaps possible to adjust predicted familiarity likelihood categories for some target populations on the basis of known effects of demographics variables. However, identifying the full range of meaningful demographic variables is not realistic. Moreover, most sites are developed for a broad range of health consumers who represent a diverse range of competencies and experiences. This limitation is not unique to our approach but is true for all attempts to evaluate the difficulty of terms or a text. While individualized prediction of text difficulty on the basis of a model is desirable, it is also much more error prone than population-wide predictions because most predictive models are based on population statistics or empirical expert knowledge. Any prediction is necessarily an approximation, but a high-quality approximation is of considerable value. Presently, our predictive model framework also does not make a theoretical distinction between surface-level familiarity and conceptual understanding and does not make provision for the possible uneven gap between the two. If the uneven gap phenomenon is confirmed, then the “easiness” of terms predicted as highly likely to be familiar may be deceptive. Answering this question requires a strong operational definition of sufficient concept knowledge and a way of assessing it. The present instrument is an exploratory step in the direction of concept knowledge measurement. A satisfactory instrument should reconcile the goals of assessing a complex and multifaceted construct while being relatively quick and easy to administer.
While most of the study results corresponded to our research hypotheses, the lack of significant effects of most demographic variables, particularly educational level, is surprising and may be due to sampling bias. It is possible that uneven representation obscured any education effects ―41 out of 52 participants had at least some college education. Note that education is a proxy for general literacy, which is only one component of health literacy [
Follow-up work includes validating and possibly adjusting the algorithm for specific populations, evaluating the role of potentially influential demographic variables in designs where these variables are represented across a broad range of values, and developing a formula that would assign a single-value text difficulty on the basis of the present algorithm. The calibration of such formulae in order to estimate the desired scores for various populations would require a set of extensive psychometric studies that are beyond the scope of most informatics research programs. However, developing the algorithm and testing its effectiveness against existing readability formulas are well within the capabilities of consumer health informatics research. It is also essential to develop methods to explore consumer understanding of health concepts in-depth, as the current study only touches the surface of this important topic.
This research was supported by the Intramural Research Program of the US National Library of Medicine, US National Institutes of Health (AK, TT, AB) and NIH grant R01 LM007222-05 (JC, LN, QZ). The authors thank Ilyse Rosenberg for her contribution to developing the instrument and Cara Hefner for assistance with the data collection.
None declared.
60-item questionnaire
consumer health vocabulary
gastroesophageal reflux disease
Short Assessment of Health Literacy for Spanish-Speaking Adults
Short Test of Functional Health Literacy in Adults
Rapid Estimate of Adult Literacy in Medicine