Published on 27.08.13 in Vol 15, No 8 (2013): August
A Method for the Design and Development of Medical or Health Care Information Websites to Optimize Search Engine Results Page Rankings on Google
Background: The Internet is a widely used source of information for patients searching for medical/health care information. While many studies have assessed existing medical/health care information on the Internet, relatively few have examined methods for design and delivery of such websites, particularly those aimed at the general public.
Objective: This study describes a method of evaluating material for new medical/health care websites, or for assessing those already in existence, which is correlated with higher rankings on Google's Search Engine Results Pages (SERPs).
Methods: A website quality assessment (WQA) tool was developed using criteria related to the quality of the information to be contained in the website in addition to an assessment of the readability of the text. This was retrospectively applied to assess existing websites that provide information about generic medicines. The reproducibility of the WQA tool and its predictive validity were assessed in this study.
Results: The WQA tool demonstrated very high reproducibility (intraclass correlation coefficient=0.95) between 2 independent users. A moderate to strong correlation was found between WQA scores and rankings on Google SERPs. Analogous correlations were seen between rankings and readability of websites as determined by Flesch Reading Ease and Flesch-Kincaid Grade Level scores.
Conclusions: The use of the WQA tool developed in this study is recommended as part of the design phase of a medical or health care information provision website, along with assessment of readability of the material to be used. This may ensure that the website performs better on Google searches. The tool can also be used retrospectively to make improvements to existing websites, thus, potentially enabling better Google search result positions without incurring the costs associated with Search Engine Optimization (SEO) professionals or paid promotion.
J Med Internet Res 2013;15(8):e183)
- health care information;
- patient education;
- medical informatics;
- generic drugs;
- website development;
- quality assessment
A multitude of studies have assessed the use, quality, and availability of medical/health care information on the Internet in areas as diverse as inflammatory bowel disease , orthodontistry [ , ], pain [ ], cancer [ - ], and mental health [ , ], among many others. Such studies often look at information available to, and used by, people in particular geographic areas, for example, pediatric asthma in Saudi Arabia [ ], preconception care in Italy [ ], and medical information in Brazil [ ] and Portugal [ ]. A PubMed search for research into online medical information, including, for example, use of resources such as Wikipedia or Google in medical education and availability of information for patients, provides thousands of search results. This is indicative of the fact that the Internet has become a source of medical information for patients and health care professionals alike, as shown by the increasing prevalence of the Internet use and social networking associated with “Web 2.0” for information sourcing and sharing online [ ].
In the area of generic medicines, misconceptions and misinformation exist that are easily disseminated and perpetuated online. Given that health care professionals have expressed poor opinions of generics in the past , it is therefore challenging to communicate accurate information to the general public about the medicines that they are taking. There is a need to provide accurate information, to dispel myths, and to counter misinformation, but also to present the material in a manner that is accessible to the intended audience. For example, it has been reported that, in the case of patients particularly, myths and uncertainties about generic medicines abound and that accurate information can be difficult to come by [ ].
A good quality medical or health care information website could be defined as one that contains accurate and unbiased information on all aspects of the topic (both positive and negative) for which the website is published, in conjunction with the ability of the website to be easily read and understood by its target audience. Where the audience is intended to be the general public, readability of the website will be a key factor in its success (as defined by the number of hits the website receives, indicative of its ranking on Internet search engine results). After all, if a website contains exemplary information but cannot be easily read and understood by its audience, it is possible for it to go largely undiscovered in the plethora of information available on the Internet. This study focused on non-advertised or promoted websites (ie, rankings on a Search Engine Results Page (SERP) that are not there as a result of a paid advertisement or promotion but rather are ranked and returned by Google’s algorithms).
While the availability and accuracy of existing online medical/health care information continues to be studied, much less work appears to have been performed in the area of development of medical information websites—in particular websites aimed at providing accurate and unbiased medical information to the general public. A PubMed search done February 22, 2013, using the search term development medical information website, returned 28 articles specifically related to the topic of development of medical/health care information websites.
The objective of this paper was to provide a method for the planning of information to be included in medical information websites and for representing that information in a readable manner. As Search Engine Optimization (SEO) can be a critical factor in ensuring top-ranking search engine results  and given that the cost of using potentially expensive online advertising or SEO professionals in order to promote a website may be prohibitive for government or advocacy groups wishing to impart good quality medical/health care information, use of the tools and techniques described in this paper will not only ensure the quality of the information in the website but may also provide the website with an improved chance of being returned to a searcher in a higher ranking on a Google SERP, without incurring significant additional cost.
To ensure a high-quality medical information website, two factors should be considered in its development: (1) the information it will present (quality, accuracy, comprehensiveness, balance, impartiality, etc) and (2) the ability of the information to be read and understood by the target audience.
Based on these factors, an assessment tool was developed that may be used to prospectively design the content of an optimized website. This study reports the composition of that tool and its validation through retrospective assessment of existing sites.
Information Gathering and Website Quality Assessment Tool Development
A tool for assessment of websites imparting information on generic drugs was developed. This Website Quality Assessment (WQA) tool consisted of a series of yes/no type questions, where a point was awarded for positive or correct information (see). No points were awarded for information lacking or for inaccurate information. Questions that cannot be answered were designated “not applicable” (N/A) and no score awarded. An overall WQA score for each website was totaled from the scores assigned to each assessment question.
In the development of the WQA tool, the following criteria were used:
- Is there a listing of the questions likely to be asked by the searcher?
- What myths or misinformation exist on the topic that may need to be dispelled or corrected?
- What information could be required by the searcher in order to assist in making informed decisions?
- Are there relevant comparisons or analogies that might help in understanding of the topic by a nonscientist or clinician?
- Is there any associated or corollary information from other related topics or areas that might be helpful to support understanding of the topic?
The number of assessment questions will be determined by the topic in question and is not fixed. However, all areas in the 5 criteria steps noted above should be covered in the WQA questions used.
Validation of the WQA Tool
To validate the tool, all searches were performed on Google (google.com) and a number of the resulting hits in the SERPs returned were assessed using the 22-question Generic Medicines WQA (). The search was physically done in several English-speaking countries, using computers with Internet protocol (IP) addresses in those countries, in order to determine if there was any country-to-country (or geographic) variability. The searches were performed in the United States, Canada, Ireland, Great Britain, and Australia. The search term used was identical in all cases: “generic drug OR medicine” (without the quotes). All searches were performed during March and April of 2012, and a total of 24 distinct websites were assessed.
To measure reproducibility of use of the tool, each of the websites was independently assessed by 2 different raters.
Assessment of Website Readability
Readability of text is an important issue, especially in the medical domain. For this study readability of text was assessed using two methods: (1) Flesch Reading Ease score and (2) Flesch-Kincaid Grade Level. However, it is worth noting that other readability evaluation methods have also been used in the assessment of medical texts .
A minimum of a 100-word sample of continuous text was selected at random from the website text and pasted into Microsoft Word. This text was then analyzed using the readability statistics in the MS Word application.
MS Word’s Flesch Reading Ease score is based on a formula developed in 1948 by Rudolf Flesch . It is computed using the average number of syllables per word and words per sentence. Syllables-per-word is a measure of word difficulty. Words-per-sentence is an indicator of syntactic complexity.
The Flesch Reading Ease scale ranges from zero to 100. Zero to 50 is very difficult to difficult reading. Eighty and above is easy to very easy reading. Flesch himself set the minimum score for plain English at 60 . Microsoft’s documentation encourages authors of standard documents to aim for a score of 60 to 70 [ , ].
The Flesch-Kincaid Grade Level, which was developed in 1975, measures the readability of a document based on the minimum education level required for a reader to understand it . Microsoft recommends aiming for a Flesch-Kincaid score of 7.0 to 8.0 for most documents. According to a 1993 study, the average adult in the United States reads at the seventh-grade level and the authors of that study recommended that materials for the public be written at a fifth- or sixth-grade reading level [ ].
The mean and standard deviation of the differences between the 2 reviewers for all three tools (WQA, Flesch Reading Ease score, and Flesch Kincaid Grade Level) were used to calculate limits of agreement, which are represented graphically in Bland-Altman plots. The intraclass correlation coefficient (ICC) was used to measure reproducibility. Spearman correlation coefficient (rs) was used to measure the association between the ranking of websites with WQA scores and readability assessments. Absolute values of rs>0.3 were considered to represent moderate correlations; >0.5 were considered strong correlations. The scores from the developer of the assessment tool (SD) were used in the correlation analyses. The correlation between ranking of websites and WQA scores was also used to demonstrate the predictive validity of this newly developed assessment tool.
Validation of the WQA Tool
Statistical analysis of the 2 independent raters (SSD and NC) using Bland-Altman plots showed that, for WQA assessments of the websites, the mean difference (SSD minus NC) represented by the solid black line in a) inwas zero (SD 1.18) indicating perfect agreement on average. The median difference was also zero (range –3 to 2). Only one observation was outside the limits of agreement (this website was a list of brand name medicines alongside the names of their generic counterparts). One rater performed the WQA based on this list, whereas the second rater looked for information on other pages of the website, thus accounting for the difference in WQA ratings awarded. An ICC value of 0.94 indicated excellent reproducibility between different users.
Similar analysis of the readability of the websites using Flesch Reading Ease score (on a scale of 0 to 100) and Flesch-Kinkaid Grade Level (on a scale of 1 to 18) showed comparable levels of agreement (see b) and c) in. The mean difference (SSD minus NC) for reading ease score is 4.66 (SD 12.06) indicating that rater SSD was scoring slightly higher than NC on average. The mean difference (rater SSD minus NC) for grade level was -1.79 (SD 2.86) indicating that rater SSD was scoring slightly lower than NC on average. One observation in each case was outside the limits of agreement. However, as each rating was independent, different sections of text were likely to be taken from each of the websites assessed. This variation in the text taken most likely accounted for the single observation outside the limits of agreement. An ICC value of 0.71 for Flesch Reading Ease score and 0.63 for Flesch-Kincaid Grade Level demonstrate moderate to strong reproducibility, particularly given the subjectivity of this type of assessment, and the possible variability in the text selected by reviewers for assessment.
Overall, the WQA and readability scores demonstrate acceptable reproducibility of the tools when by used by more than 1 rater.
Correlation Between WQA Score and SERP Ranking
Scatterplots of WQA score against rankings on Google SERPs in different regions worldwide (United States, Canada, Ireland, United Kingdom, and Australia) are given in. Using Spearman correlation coefficient, a moderate to strong correlation between a WQA score and ranking on Google SERPs could be seen ( ). The observed relationship was seen in Google searches done in the different regions worldwide indicating that the correlation occurs regardless of the location or IP address of the searcher’s computer. The strongest correlation (rs=-0.67), was seen in the Google search performed in the United States.
Therefore, use of WQA assessment questions while developing information for inclusion in a medical information website could, by corollary, be a step towards ensuring higher Google SERP rankings and, therefore, exposure to a greater potential audience for the website.
Correlation of Readability With SERP Ranking
There was also a relationship, in general, between readability and ranking on Google searches (). Flesch Reading Ease scores were correlated with the SERP ranking of the websites in each country. Again, the strongest relationship was seen in the US Google search (rs=-0.64). In general, the top ranked sites (placed 1, 2, etc) tended to have the higher Reading Ease scores. Because of the small sample sizes in the study (at most 10 websites in each domain) and hence low statistical power, a descriptive analysis is presented and no hypothesis tests were carried out.
Additionally, scores for Flesch-Kincaid Grade Level assessments were correlated with SERP ranking of the websites. In general, the top ranked sites tended to have lower grade level values with the most significant relationship again being seen in the US search (rsvalue of 0.68). Therefore, the implication is that that websites with greater ease of readability are more likely to rank high in, and therefore be accessed from, Google SERPs.
Prior to publication of a website, information must be gathered and written that will be disseminated to the intended audience through the website. Development and use of a specific WQA-type assessment during the design phase of a medical/health care information website on any topic will ensure that the information put into the website is of sufficient quality to satisfy potential searchers and users of the website. WQA can be used to assess drafts of the information to be published. Use of positive and negative scoring (positive scoring for information that is necessary, of good quality, and needed to support the integrity of the website; negative scoring for any information that is inaccurate, biased, or that may take from the integrity of the information) employed by WQA assessment ensures that all aspects of the information gathering initiative are accounted for during the website design.
As the Internet is one of the first places a patient is likely to go when searching for medical information  and given that Google is the primary search engine in use worldwide, holding almost 90% of the global search engine market [ ], corollary use of WQA could possibly lead to higher rankings on Google SERPs for websites using this tool in their design and development.
Furthermore, this study has demonstrated that websites with greater ease of readability are more likely to rank high in, and therefore be accessed from, Google searches. Therefore, inclusion of Flesch Reading Ease and Flesch-Kincaid Grade Level assessments as part of the WQA enable a more comprehensive assessment of how the website might perform in Google searches. We have demonstrated in this paper that high readability scores and WQA scores are more likely to lead to a high Google SERP ranking.
A limitation of this study is the small number of websites assessed. Further studies in this area could make use of technology, for example, a web crawler to gain additional information that could allow for clustering or commonalities across a spectrum of similar websites to be examined. A further study could evaluate sites containing similar content but focus instead on usability and accessibility, for example, are the sites well designed, are they pleasing to the eye, and is the navigation user-friendly? Isolating such content from the design and visual presentation of websites would provide further insight into the usability and accessibility of medical information providing websites that would complement the findings in this paper. Indeed, information from such a study, if done using websites focused on generic medicines, may provide insight into the adoption and penetration of such medicines in different markets worldwide.
Readability formulas, additionally, have limitations in that a favorable score may not always be fully indicative of clarity of information (for instance, not all low-syllable words are always clearly understood, shorter sentences are not always necessarily easier to read, and inferences may be required that may increase the complexity of the text). Therefore, these formulas need to be used in conjunction with other plain language guidelines when writing for provision of health care information (especially for low literacy and limited English proficiency audiences), and not used as sole measures of understandability.
With about 16% of adults in the United Kingdom being described as “functionally literate” (ie, they have literacy levels at or below those expected from an 11-year old ), and the International Adult Literacy Survey showing that 1 in 4 adults in the Republic of Ireland have problems with even the simplest of literacy tasks [ ] (with similar rates being seen in the United States [ ] and Canada [ ]), it is fair to say that writing of medical information websites with this in mind may be the most important aspect in providing medical information to the general public. This point, of course, applies to all printed material (eg, pamphlets given to patients), not just information published online. Arguably, it follows that training writers of medical information (to be disseminated to the general public, for instance) in methods of presenting simple, clear language is an important aspect in ensuring that the general public understand the information that health care professionals might be trying to impart to them. This becomes particularly important in light of research showing that there is often a discrepancy between the information that a physician believes a patient to have and what the patient actually understands [ ].
Language complexity as a block to accessibility of information has been recognized by Wikipedia, the 6thmost commonly accessed website in the world  and, as a solution, Wikipedia is available in both English and Simple English, where the Simple version is intended to be more accessible by use of simplified language and limited vocabulary. Consequently, Wikipedia guidelines on writing of the Simple version may be of use to those creating medical information websites for the general public [ ].
Overall, use of the WQA tool in the planning and preparation of material for medical information websites, alongside an assessment of readability of the written material, is likely to ensure that the website subsequently ranks higher in Google SERPs and is thus more likely to be accessed, as well as read and understood, by the intended audience.
The authors would like to thank Ms YT Chueh and Dr Phil Hensche for their help in performing the Internet searches. This work was supported in part by a scholarship from the Faculty of Education and Health Sciences, University of Limerick, Ireland.
Conflicts of Interest
Multimedia Appendix 1
Bland-Altman plots for WQA, Flesch Reading Ease Score, and Flesch Kincaid grade level.PDF File (Adobe PDF File), 257KB
Multimedia Appendix 2
Scatterplots of WQA score against rankings on .com domains.PDF File (Adobe PDF File), 260KB
- Langille M, Bernard A, Rodgers C, Hughes S, Leddin D, van Zanten SV. Systematic review of the quality of patient information on the internet regarding inflammatory bowel disease treatments. Clin Gastroenterol Hepatol 2010 Apr;8(4):322-328. [CrossRef] [Medline]
- Livas C, Delli K, Ren Y. Quality evaluation of the available Internet information regarding pain during orthodontic treatment. Angle Orthod 2013 May;83(3):500-506. [CrossRef] [Medline]
- Fraval A, Ming Chong Y, Holcdorf D, Plunkett V, Tran P. Internet use by orthopaedic outpatients - current trends and practices. Australas Med J 2012;5(12):633-638 [FREE Full text] [CrossRef] [Medline]
- Colón Y. Searching for pain information, education, and support on the Internet. J Pain Palliat Care Pharmacother 2013 Mar;27(1):71-73. [CrossRef] [Medline]
- Peterson MW, Fretz PC. Patient use of the internet for information in a lung cancer clinic. Chest 2003 Feb;123(2):452-457. [Medline]
- Helft PR, Eckles RE, Johnson-Calley CS, Daugherty CK. Use of the internet to obtain cancer information among cancer patients at an urban county hospital. J Clin Oncol 2005 Aug 1;23(22):4954-4962. [CrossRef] [Medline]
- Vordermark D, Kölbl O, Flentje M. The Internet as a source of medical information. Investigation in a mixed cohort of radiotherapy patients. Strahlenther Onkol 2000 Nov;176(11):532-535. [Medline]
- Powell J, Clarke A. Internet information-seeking in mental health: population survey. Br J Psychiatry 2006 Sep;189:273-277 [FREE Full text] [CrossRef] [Medline]
- Lissman TL, Boehnlein JK. A critical review of internet information about depression. Psychiatr Serv 2001 Aug;52(8):1046-1050. [Medline]
- AlSaadi MM. Evaluation of internet use for health information by parents of asthmatic children attending pediatric clinics in Riyadh, Saudi Arabia. Ann Saudi Med 2012;32(6):630-636. [CrossRef] [Medline]
- Agricola E, Gesualdo F, Pandolfi E, Gonfiantini MV, Carloni E, Mastroiacovo P, et al. Does googling for preconception care result in information consistent with international guidelines: a comparison of information found by Italian women of childbearing age and health professionals. BMC Med Inform Decis Mak 2013;13:14 [FREE Full text] [CrossRef] [Medline]
- Gondim AP, Weyne DP, Ferreira BS. Quality of health and medication information on Brazilian websites. Einstein (Sao Paulo) 2012 Sep;10(3):335-341 [FREE Full text] [Medline]
- Del Giglio A, Abdala B, Ogawa C, Amado D, Carter D, Gomieiro F, et al. Quality of internet information available to patients on websites in Portuguese. Rev Assoc Med Bras 2012;58(6):645-649 [FREE Full text] [Medline]
- Chou WY, Prestin A, Lyons C, Wen KY. Web 2.0 for health promotion: reviewing the current evidence. Am J Public Health 2013 Jan;103(1):e9-18. [CrossRef] [Medline]
- Dunne S, Shannon B, Dunne C, Cullen W. A review of the differences and similarities between generic drugs and their originator counterparts, including economic benefits associated with usage of generic medicines, using Ireland as a case study. BMC Pharmacol Toxicol 2013;14:1 [FREE Full text] [CrossRef] [Medline]
- Baumgärtel C. Myths, questions, facts about generic drugs in the EU. GaBI J 2012 Feb 15;1(1):34-38. [CrossRef]
- Auinger A, Brandtner P, Großdeßner P, Holzinger A. Findability and Usability as Key Success Factors. In: Proceedings of the International Conference on e-Business (DCNET/ICE-B/OPTICS). 2012 Presented at: International Conference on e-Business (DCNET/ICE-B/OPTICS); July 24-27, 2012; Rome.
- Holzinger A, Baernthaler M, Pammer W, Katz H, Bjelic-Radisic V, Ziefle M. Investigating paper vs. screen in real-life hospital workflows: Performance contradicts perceived superiority of paper in the user experience. International Journal of Human-Computer Studies 2011 Aug;69(9):563-570. [CrossRef]
- Flesch R. A new readability yardstick. J Appl Psychol 1948 Jun;32(3):221-233. [Medline]
- Stockmeyer NO. Using Microsoft Word's Readability Program. 2009. URL: http://www.michbar.org/journal/pdf/pdf4article1467.pdf [accessed 2013-03-17] [WebCite Cache]
- Test your document's readability. URL: http://office.microsoft.com/en-us/word-help/test-your-document-s-readability-HP010148506.aspx [accessed 2013-03-17] [WebCite Cache]
- Kincaid JP, Fishburne RP, Rogers RL, Chissom BS. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Springfield, Virginia: National Technical Information Service; 1975. URL: http://digitalcollections.lib.ucf.edu/cdm4/document.php?CISOROOT=/IST&CISOPTR=26301&CISOSHOW=26253 [WebCite Cache]
- Diaz JA, Griffith RA, Ng JJ, Reinert SE, Friedmann PD, Moulton AW. Patients' use of the Internet for medical information. J Gen Intern Med 2002 Mar;17(3):180-185 [FREE Full text] [Medline]
- StatCounter. Bing Overtakes Yahoo! Globally for First Time.: BusinessWire; 2011. URL: http://www.businesswire.com/news/home/20110301006261/en/Bing-Overtakes-Yahoo!-Globally-Time-%D0-StatCounter [accessed 2013-03-17] [WebCite Cache]
- How many illiterate adults are there in England?. URL: http://www.literacytrust.org.uk/adult_literacy/illiterate_adults_in_england [accessed 2013-03-17] [WebCite Cache]
- Literacy in Ireland. URL: http://www.nala.ie/literacy/literacy-in-ireland [accessed 2013-03-17] [WebCite Cache]
- IALS Results. National Center for Education Statistics URL: http://nces.ed.gov/surveys/all/results.asp [accessed 2013-03-17] [WebCite Cache]
- Adult Literacy in OECD Countries: Technical Report on the First International Adult Literacy Survey. URL: http://nces.ed.gov/pubs98/98053.pdf [accessed 2013-03-17] [WebCite Cache]
- Olson DP, Windish DM. Communication discrepancies between physicians and hospitalized patients. Arch Intern Med 2010 Aug 9;170(15):1302-1307. [CrossRef] [Medline]
- Top 500 Global Sites. 2013. URL: http://www.alexa.com/topsites [accessed 2013-03-17] [WebCite Cache]
- Wikipedia: How to write Simple English pages. URL: http://simple.wikipedia.org/wiki/Wikipedia:How_to_write_Simple_English_pages [accessed 2013-03-17] [WebCite Cache]
|ICC: intraclass correlation coefficient|
|IP: Internet protocol|
|SEO: search engine optimization|
|SERP: search engine results page|
|WQA: website quality assessment|
Edited by G Eysenbach; submitted 25.03.13; peer-reviewed by A Holzinger, F Bassetti; comments to author 12.05.13; revised version received 17.05.13; accepted 11.06.13; published 27.08.13
©Suzanne Dunne, Niamh Maria Cummins, Ailish Hannigan, Bill Shannon, Colum Dunne, Walter Cullen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 27.08.2013.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.