Website Quality Indicators for Consumers
Background: The rating tool DISCERN was designed for use by consumers without content expertise to evaluate the quality of health information. There is some evidence that DISCERN may be a valid indicator of evidence-based website quality when applied by health professionals. However, it is not known if the tool is a valid measure of evidence-based quality when used by consumers. Since it is a lengthy instrument requiring training in its use, DISCERN may prove impractical for use by the typical consumer. It is therefore important to explore the validity of other simpler potential indicators of site quality such as Google PageRank.
Objective: This study aimed to determine (1) whether the instrument DISCERN is a valid indicator of evidence-based Web content quality for consumers without specific mental health training, and (2) whether Google PageRank is an indicator of website content quality as measured by an evidence-based gold standard.
Methods: This was a cross-sectional survey of depression websites using consumer and health professional raters. The main outcome measures were (1) site characteristics, (2) evidence-based quality of content as measured by evidence-based depression guidelines, (3) DISCERN scores, (4) Google PageRank, and (5) user satisfaction.
Results: There was a significant association between evidence-based quality ratings and average DISCERN ratings both for consumers (r = 0.62, P = .001) and health professionals (r = 0.80, P < .001). Consumer and health professional DISCERN ratings were significantly correlated (r = 0.77, P < .001). The evidence-based quality score correlated with Google PageRank (r = 0.59, P = .002). However, the correlation between DISCERN scores and user satisfaction was higher than the correlation between Google PageRank and user satisfaction.
Conclusions: DISCERN has potential as an indicator of content quality when used either by experts or by consumers. Google PageRank shows some promise as an automatic indicator of quality.
J Med Internet Res 2005;7(5):e55)
There has been widespread concern about the quality of Web-based health information designed for consumers . In response to this, a number of initiatives have been developed to assist consumers in locating quality health information on the Web. These include the use of quality labels based on compliance with codes of conduct (eg, HON code), portals that provide a gateway to websites of “high quality” (eg, OMNI), and rating tools designed for consumer use [ ].
One rating tool that shows particular promise is DISCERN, an instrument designed for use by consumers and providers “to judge the quality of written information about treatment choices” [3, p. 106]. This tool is widely recommended and used by authoritative sources for the evaluation of websites. However, it has not yet been convincingly established that DISCERN, particularly when used by consumers, is a valid indicator of quality when compared against an evidence-based gold standard.
Three studies have investigated the relationship between the DISCERN ratings of experts and “scientific” quality [- ]. Two of the studies reported a significant association between DISCERN and scientific accuracy [ , ], but the authors of the third study found “no clear relationship between methodological (DISCERN) and medical-scientific quality” [ ]. Unfortunately, except for the Griffiths and Christensen study [ ], it is unclear if the standard against which the DISCERN ratings were compared was based on systematic reviews of the evidence. Moreover, in each study, ratings were made by health professionals. To date, to our knowledge, there has been no assessment of the validity of DISCERN as measured by an evidence-based gold standard when used by consumers without technical expertise.
Although the developers trialed DISCERN with self-help group users in a research context, it is a lengthy instrument, and it is not clear if individual consumers would use DISCERN in practice. Other simpler potential indicators of site quality include those based on the link structure of the World Wide Web. For example, Google PageRank is an automatically computed measure of the importance of a website based on the number and importance of Web pages linking to it. However, there is little evidence as to the validity of link structure as an indicator of quality.
The current study, therefore, sought to determine the following for depression information websites: (1) whether DISCERN is a valid indicator of evidence-based content quality for consumers without specific mental health training, and (2) whether Google PageRank is an indicator of content quality. Depression websites were selected because depression is a leading cause of disease burden , there is a high level of unmet need among people with depression [ ], depression is one of the most common reasons consumers access health information on the Internet [ ], and evidence-based guidelines for depression management are available.
Twenty-four depression websites with a Google PageRank were selected from the Depression Directory of the DMOZ Open Directory Project website (n = 127). Three sites for each Google PageRank score within the range 0 to 7 were randomly selected using the R Project statistical package  to ensure a range of sites were represented. Each of the selected sites was then captured (in April 2003) and electronically archived for assessment using purpose built software. External links from these sites were excluded.
Sites were rated online by four researchers/health professionals with expertise in depression and three consumers with a history of depression but no professional experience in mental health or research. Two of the health professionals (KG, HC) rated the site using an evidence-based gold standard. They also rated the characteristics of each website. The other two health professionals (AJ, RK) and the three consumers rated the sites using the DISCERN measures. All raters provided satisfaction measures for each site. Sites were presented in a different random order for each rater, and each rater was supplied with a pro forma rating sheet. The consumer raters were employed as casual research assistants during the study.
Each site was rated on a range of attributes, including ownership structure, scope, editorial arrangement, and legal policies ().
Evidence-Based Guideline Score
Evidence-based quality was assessed using the depression guidelines produced by the Centre for Evidence Based Mental Health (CEBMH) at Oxford . The guideline score was the number of CEBMH items (maximum 20) correctly endorsed by the website [ ]. In the current study, the correlation between evidence-based guideline scores for the two health professional raters was 0.94 (P < .001). An average guideline score was therefore computed for the two raters.
The DISCERN instrument comprises 15 items (each rated from 1 to 5) and an additional “overall quality” item (rated 1 to 5) [, ]. Raters in the current study were informed that the DISCERN questionnaire was designed to assess the quality of information about medical treatments and that “In this study we are focusing on the quality of web sites related to the treatment of depression.” Each rater was provided with the DISCERN instrument, which includes hints for rating each item, and the DISCERN handbook, which contains detailed information about the scoring of DISCERN items. Items in DISCERN include questions about the reliability of the publication (eg, are information sources specified, is it clear where these information sources were produced, degree to which the discussion is balanced) and the quality of information on treatments (eg, description of the mechanism, benefits, risks of possible choices and inclusion of multiple treatment options).
Previous research has demonstrated acceptable inter-rater agreement on individual items of the instrument when used by expert health professionals and “fair” agreement among consumers . The original version of the test used the overall quality score as the measure of quality. However, subsequently, a number of studies employing DISCERN have used a measure of quality based on a total DISCERN score derived by cumulating scores across the first 15 DISCERN items (minimum score = 15; maximum = 75) (eg, [ , , , ]). This measure shows acceptable inter-rater agreement (r = 0.88 [ ], r = 0.82 [ ]) and has been reported to correlate with the overall quality rating (r = 0.8 [ ]). In the current study, the correlation between the total DISCERN score and the overall quality item score was 0.91 for consumers and 0.92 for experts. The DISCERN results reported in the primary analyses are therefore confined to the total DISCERN measure.
The correlation between the DISCERN ratings for the two health professionals was 0.86 (P < .001). Intercorrelations between DISCERN ratings for the three consumer raters were 0.78, 0.77, and 0.68 (P < .001). An average score was therefore computed for the health professionals and the consumers. The DISCERN ratings for the health professionals and consumers were significantly correlated (r = 0.77, P < .001). A paired t test demonstrated that mean DISCERN scores for the two types of rater did not differ significantly across the 24 websites (t23 = 0.64, P = .53).
Website satisfaction was measured using a series of 9 items developed for the purpose of the study. Items included questions about the target website’s perceived usefulness, relevance to people with depression, trustworthiness, author knowledge, esthetics, and whether the site could be easily understood, easily navigated, and would be recommended. A total satisfaction score was calculated by computing the total number of satisfaction items endorsed by the rater (minimum 0, maximum 9). The correlation between satisfaction ratings for the two evidence-based guideline health professional raters was 0.86 (P < .001) and for the two DISCERN health professional raters was 0.83. Intercorrelations between satisfaction measures for the three consumer raters were significant in two of the three cases (rater 1 vs 2: r = 0.60, P = .002; rater 2 vs 3: r = 0.58, P = .003; rater 1 vs 3: r = 0.26, P = .22). Therefore, although the satisfaction measure for the evidence-based guideline health professional raters was based on their average score, and an average score was also computed for the DISCERN health professional raters, the satisfaction measures for the three consumer raters were treated separately.
Google ToolBar PageRank
Google PageRank is employed by the Google search engine as a measure of the “importance” of a Web page. These PageRank values can range from 0 to 10, with higher values indicating greater importance. PageRanks are based on an iterative algorithm developed by Google founders Brin and Page  that takes into account the number and importance of pages which link to a website. The importance of pages linking to a site is assessed according to the number and importance of sites linking to those pages. The PageRank score on the Google toolbar is a transformed function (conjectured to be logarithmic or distributional) of a raw Google PageRank score. The latter are very small positive numbers which sum to 1.0 over the entire Web and are known to be power-law distributed [ ]. Google PageRank differs from the ranking order in Google search results in that PageRank is query independent, whereas the ranking order in Google search results takes into account many other variables, such as frequency of occurrence of search terms on a page, anchor text used to link to sites, and a large number of other tuning variables not disclosed by the company, as well as PageRank.
The Google PageRank for each site was obtained by downloading the Google toolbar and recording the integer number attached to the toolbar. The lowest and highest identified page ranks in the DMOZ depression directory were 0 and 7, respectively.
Intercorrelations between evidence-based scores, DISCERN, and overall satisfaction were computed using Pearson r tests. (Note that when these analyses were recomputed using non-parametric Spearman rho tests, similar patterns of results were observed.) Site quality was assessed as a function of site characteristic using independent t tests (with Levene’s correction in the case of unequal variances). Differences between evidence-based scores as a function of individual satisfaction items were analyzed separately using independent t tests except that no analysis was performed for items for which the sample sizes in a cell were very small (less than 6 sites). Multiple independent t tests were also used in analyzing the effects of site characteristics and for individual satisfaction items because the data were not amenable to an overall multivariate analysis such as a multiple regression or a MANOVA followed by contrasts corrected for multiple comparisons. For example, there were insufficient websites given the number of independent predictors to apply multiple regression to the data. The probability values cited in the results tables and text therefore refer to error rate per comparison. Given that a large number of comparisons were conducted in this study, the chance of reporting one or more spuriously significant results is high. For this reason, patterns of results, rather than isolated findings, are emphasized in reporting and interpreting the study results, particularly with respect to the satisfaction items. With the exception of tests of the significance of differences between dependent correlations, which were carried out using the SISA online calculator , all analyses were carried out using SPSS version 13.0 [ ].
Level of Quality and Satisfaction
Overall, the mean evidence-based score was low (3.6, SD = 3.9), and the mean DISCERN ratings for both the health professional and consumer raters fell in the poor to average range (health professionals: mean = 37.8, SD = 17.0; consumers: mean =36.3, SD = 10.6). Mean satisfaction scores were low for the evidence-based raters (mean = 2.8, SD = 2.1), were average for the health professional DISCERN raters (mean = 4.3, SD = 2.6), and average for the consumer DISCERN raters (rater 1: mean = 6.1, SD = 2.3; rater 2: mean = 4.7, SD = 3.0; rater 3: mean = 4.4, SD = 3.7).
Association Between Evidence-Based Quality and the Potential Indicators of Quality
There was a strong correlation between the average evidence-based score and the average DISCERN ratings for the health professionals (r = 0.80, P < .001) and a moderately high correlation for consumers (r = 0.62, P = .002). For health professionals, intercorrelations between DISCERN ratings and evidence-based scores for each of the items considered separately ranged from 0.37 (P = .08) for Item 5 (Is it clear when the information used or reported in the publication was produced?) to 0.88 for Item 3 (P < .001) (Is it relevant? eg, Does the publication address the questions readers might ask and are the treatment recommendations realistic or appropriate?). For consumers, this range was 0.18 (P = .40) for Item 5 to 0.68 (P < .001) for Item 3.
There was a moderate correlation between the evidence-based guideline score and Google PageRank (r = 0.59, P = .002). The size of this correlation was almost the same as that between the consumer DISCERN ratings and evidence-based scores.
Associations Between Quality Measures and Satisfaction
Evidence-based ratings were significantly correlated with overall rater satisfaction (r = 0.85, P < .05). Sites that were judged by consumers to have useful treatment information, to describe what a consumer might wish to know about depression, to be trustworthy, and to be written by people who knew about depression showed better evidence-based quality, at least for 2 of the 3 consumers (). There were no significant differences in evidence-based scores for consumers as a function of the judged attractiveness of the site or whether they would recommend it to someone else. Sites judged by health professional raters as useful, relevant, written by a knowledgeable author, and worthy of recommendation were of higher evidence-based quality. There were no significant differences in evidence-based scores as a function of whether the site was judged by health professionals to be navigable. The pattern of findings for the health professionals who provided evidence-based ratings was similar to the pattern of findings for health professionals who conducted DISCERN ratings.
Consumer DISCERN ratings were strongly correlated with satisfaction ratings (rater 1: r = 0.74, P < .001; rater 2: r = 0.85, P < .001) as were expert DISCERN ratings (r = 0.95, P < .001). By contrast, PageRank was correlated with consumer satisfaction for one rater only (rater 1: r = 0.45, P = .03; rater 2: r = 0.35, P > .05; rater 3: r = 0.21, P > .05) and was only moderately correlated with expert satisfaction ratings (r = 0.50, P = .01). This difference in correlation for the DISCERN and PageRank conditions was significant for two of the consumers and also for the health professionals (consumer rater 1: difference in r = 0.29, 95% CI = −0.02 to 0.60); rater 2: difference in r = 0.50, 95% CI = 0.18 to 0.82; rater 3: difference in r = 0.64, 95% CI = 0.26 to 1.02; health professionals: difference in r = 0.45, 95% CI = 0.23 to 0.67)
This study provides the first published demonstration that DISCERN is an indicator of evidence-based website quality when used by consumers. It also confirms our previous finding  that DISCERN is an indicator of evidence-based quality when used by health professionals.
The finding that DISCERN may be a valid means for consumers to identify websites of high quality and satisfaction has practical implications for consumers. It is unlikely that individual consumers will invest the time required to use DISCERN solely for their own purposes. However, used with caution and an understanding that it is not a perfect predictor of evidence-based quality, DISCERN may be relevant to consumer organizations interested in assembling lists of links to high quality websites for their membership or for visitors to their website. Moreover, the finding that DISCERN may be useful for consumers raises the possibility that DISCERN might also be validly applied by other nontechnical experts, an observation of potential relevance to any organization or Web constructor interested in inexpensively assembling quality portals.
Interestingly, in the case of consumers, Google PageRank is as strong an indicator of evidence-based quality as DISCERN. Thus, this measure may be a simple and practical means by which individual consumers can evaluate, albeit imperfectly, the likely quality of mental health sites. Apart from the time required to download the Google toolbar in the first instance, its use requires minimal expertise and time. In addition, PageRank is likely to be convenient for users seeking health information on the Web since they typically do so by means of a search rather than via directories or portals [, ]. Since the Google PageRank was correlated less highly with satisfaction than was DISCERN, the latter may be the preferred rating tool for organized groups for whom the overhead in learning to use DISCERN can be justified. However, even in this circumstance, it is possible that Google PageRank could be used as a screening device to eliminate likely sites of low quality and the more time consuming DISCERN instrument then applied to the remaining sites. Alternatively, the reduction in sites may render the task of assessment by a content expert feasible.
These results represent a first step toward identifying tools that consumers who are not content experts can use as valid indicators of the evidence-based quality of websites. Further research is required to explore the utility of DISCERN and Google PageRank. In particular, it is important to determine optimal cutoff points for identifying higher quality sites and to explore the sensitivity and specificity of the measures. It is also of interest to document the relative utility of DISCERN for nontechnical raters of differing educational backgrounds, experience with the instrument, and Web experience. Finally, given that not one but many indicators may be useful in identifying high-quality sites, there may be value in identifying optimal combinations of multiple indicators of quality. There is also much to be gained by further identifying automatic indicators of the type that could be factored into the relevance algorithms of a specialized focused search engine.
This study was funded by a National Health and Medical Research Council Australia Program Grant to the Centre for Mental Health Research and by a grant from beyondblue: the national depression initiative. The work described here was carried out independently of the funders.
KG conceived and designed the study, acted as a health professional rater (evidence-based), analyzed and interpreted the data, and wrote the paper. HC designed the study, acted as a health professional rater (evidence-based), and edited the paper. Dr. Simon Blomberg generated the list of websites, supervised the consumer research assistants, and collected the data. He and Kelly Blewitt set up the database and entered the data. Three consumers provided DISCERN and satisfaction ratings and were employed as research assistants for this purpose. Professor Anthony Jorm and Dr. Richard O’Kearney (health professionals) provided DISCERN and satisfaction ratings for each site. Anthony Bennett developed the Web-capture software.
Conflicts of Interest
- Eysenbach G, Powell J, Kuss O, Sa ER. Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review. JAMA 2002;287(20):2691-2700 [FREE Full text] [Medline] [CrossRef]
- . Quality criteria for health related websites. J Med Internet Res 2002;4(3):e15. [CrossRef]
- Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Commun Health 1999 Feb;53(2):105-111 [FREE Full text] [Medline]
- Griffiths KM, Christensen H. The quality and accessibility of Australian depression sites on the World Wide Web. Med J Aust 2002 May 20;176 Suppl:S97-S104 [FREE Full text] [Medline]
- Bartels U, Hargrave D, Lau L, Esquembre C, Humpl T, Bouffet E. Analyse padiatrisch neuro-onkologischer Informationen in deutschsprachigen Internetseiten. Klin Padiatrie 2003;215(6):352-357. [CrossRef]
- Turp J, Gerds T, Neugebauer S. Myoarthropathien des Kausystems: Beurteilung der Qualitat von Patienteninformationen im Weltweiten Netz. Zeitschrift fur arztliche Fortbildung und Qualitatssicherung. In: Zusammenarbeit mit der Kaiserin-Friedrich-Stiftung fur das arztliche Fortbildungswesen 2001;95(8):539-547.
- Murray C, Lopez A. The global burden of disease: a comprehensive assessment of mortality and disability from diseases, injuries and risk factors in 1990 and projected to 2020. Cambridge, Mass: Harvard University Press; 1996.
- Andrews G, Henderson S. Unmet Need in Psychiatry: Problems, Resources, Responses. Cambridge, UK: Cambridge University Press; 2000.
- Taylor H. Explosive growth of 'cyberchondriacs' continues. URL: http://www.harrisinteractive.com/harris_poll/index.asp?PID=117 [WebCite Cache]
- The R Project for Statistical Computing. URL: http://www.r-project.org/ [accessed 2005 Oct 24] [WebCite Cache]
- . A systematic guide for the management of depression in primary care [treatment]. URL: http://www.psychiatry.ox.ac.uk/cebmh/guidelines/depression/depression1.html [accessed 2005 Oct 26] [WebCite Cache]
- Charnock D. The DISCERN handbook. Quality criteria for consumer health information on treatment choices. Oxford, UK: Radcliffe Medical Press Ltd; 1998.
- Hargrave D, Bartels U, Lau L, Esquembre C, Bouffet E. [Quality of childhood brain tumour information on the Internet in French language]. Bull Cancer 2003 Jul;90(7):650-655. [Medline]
- Ademiluyi G, Rees CE, Sheard CE. Evaluating the reliability and validity of three tools to assess the quality of health information on the Internet. Patient Educ Couns 2003 Jun;50(2):151-155. [Medline]
- Brin S, Page L. Anatomy of a large-scale hypertextual web search engine. Presented at: 7th International World Wide Web Conference; April 14-18, 1998; Brisbane, Australia.
- Pandurangan G, Raghavan P, Upfal E. Using PageRank to Characterize Web Structure. Presented at: 8th Annual International Conference on Combinatorics and Computing (COCOON); 2002; Singapore.
- SISA online statistical analysis. URL: http://home.clara.net/sisa/correl.htm [accessed 2005 Mar 11] [WebCite Cache]
- . SPSS (12.0.1) for Windows. Chicago, Ill: SPSS. URL: http://www.spss.com/spss [accessed 2005 Mar 11] [WebCite Cache]
- Eysenbach G, Köhler C. How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews. BMJ 2002 Mar 9;324(7337):573-577 [FREE Full text] [PMC] [Medline] [CrossRef]
- Fox S, Fallows D. Internet health resources. Washington, DC: Pew Internet & American Life Project; 2003.
- Griffiths KM, Christensen H. Quality of web based information on treatment of depression: cross sectional survey. BMJ 2000 Dec 16;321(7275):1511-1515 [FREE Full text] [PMC] [Medline] [CrossRef]
- Cohen J. Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.
Edited by G. Eysenbach; submitted 20.01.05; peer-reviewed by H Witteman, S Shepperd; comments to author 18.02.05; revised version received 09.03.05; accepted 21.10.05; published 15.11.05
© Kathleen M Griffiths, Helen Christensen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.11.2005. Except where otherwise noted, articles published in the Journal of Medical Internet Research are distributed under the terms of the Creative Commons Attribution License (http://www.creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited, including full bibliographic details and the URL (see "please cite as" above), and this statement is included.