This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Web-based question and answer (Q&A) sites have emerged as an alternative source for serving individuals’ health information needs. Although a number of studies have analyzed user-generated content in web-based Q&A sites, there is insufficient understanding of the effect of disease complexity on information-seeking needs and the types of information shared, and little research has been devoted to the questions concerning multimorbidity.
This study aims to investigate seeking of health information in Q&A sites at different levels of disease complexity. Specifically, this study investigates the effects of disease complexity on information-seeking needs, types of information shared, and stages of disease development.
First, we selected a random sample of 400 questions separately from each of the Q&A sites: Yahoo Answers and WebMD Answers. The data cleaning resulted in a final set of 624 questions from the two sites. We used a mixed methods approach, including qualitative content analysis and quantitative statistical analysis.
The one-way results of ANOVA showed significant effects of disease complexity (single vs multimorbid disease questions) on two information-seeking needs: diagnosis (
Our findings present implications for the design of web-based Q&A sites to better support health information seeking. Future studies should be conducted to validate the generality of these findings and apply them to improve the effectiveness of health information in Q&A sites.
Reports from the Pew Research Center indicate that an increasing number of people use web-based services to obtain health information, rising from 25% of Americans in 2000 to 72% in 2014 [
Web-based Q&A sites allow health information consumers (HICs) to post questions for other users to answer [
The motives of users participating in Q&A sites, especially in the long term, have been explained based on social theories [
Several studies have examined the information-seeking behaviors of HICs based on the characteristics of their questions and information needs [
One major limitation of previous studies is that they focused on single-disease entities in isolation, such as cancer [
This study aims to investigate health information sharing in web-based Q&A platforms with a specific focus on the effect of disease complexity, namely the characteristics of multiple-disease health questions as compared with characteristics of single-disease health questions. There is little understanding of the effect of disease complexity on the characteristics of health questions, and we do not know how deeply these differences run. More specifically, this paper aims to answer the following research questions: Are there any differences between questions relating to a single disease compared with those relating to multimorbidity in web-based Q&A platforms? How deep do these differences run in terms of information-seeking needs, types of information shared, and stages of disease development?
We chose kidney disease as the health topic. According to statistics from the National Institute of Diabetes and Digestive and Kidney Diseases, kidney disease ranges from simple conditions, such as small stones, to more serious illnesses such as chronic kidney disease, the prevalence of which is estimated to be 14% in the general population [
We collected data from an expert-curated Q&A site, WebMD Answers, and a community-based site, Yahoo! Answers. WebMD is one of the most influential web-based health sites [
Yahoo! Answers features health as one of the top-level categories; therefore, it was selected as a community-based Q&A site. Since its creation in December 2005, Yahoo! Answers has become a popular internet reference site worldwide, and it is the most frequented community Q&A site in the United States. As of June 2019, the site ranked ninth in global internet traffic and engagement over the past 90 days and seventh in the United States [
Given the selection of kidney disease, we screened questions based on the following key terms: kidney, kidney infection, kidney stone, kidney cancer, kidney disease, chronic kidney disease, dialysis, kidney failure, renal artery stenosis, and renal cell carcinoma. These key terms were selected as they directly refer to kidney conditions or early signs of chronic kidney disease [
We manually removed noise such as advertisements, irrelevant questions (non–kidney-related, nonhuman subjects, or student projects), or posts that did not have an actual question. After cleaning, we had 316 questions from Yahoo Answers and 308 from WebMD Answers. In total, the data cleaning process resulted in a final set of 624 questions. For each of the included questions, we extracted the title, descriptions, date of posting, categories under which the questions were posted, number of answers, and the answers themselves.
The data sets contained both qualitative (the actual content) and quantitative information (such as dates of posting and replies, demographics of participants, topic, and length and number of questions). In this study, we focused on the questions themselves. Therefore, a mixed methods approach was necessary for this analysis.
Content analysis is widely used as a qualitative research method for analyzing questions and answers from web-based Q&A websites [
The independent variable was disease complexity, with two levels: single and multiple (multimorbidity). We measured the variables based on the number of diseases described in the health questions. The health question was labeled as
We contextualized the health questions by adapting relevant findings from a previous study on health information seeking [
HICs ask health-related questions to address specific information needs [
Symptom: to gain an understanding of the symptoms of a kidney or any other related disease.
Diagnosis: to confirm the nature of a certain disease.
Causes: to figure out the causes of the disease.
Prognoses: to inquire about the hypothetical effect of a disease.
Treatment: to explore treatment alternatives to kidney disease.
Supplements and lifestyle: to explore lifestyle and diet in people with kidney disease and use different supplements.
Information sources, medical profession, and related types of information: to look for medical experts in the field and any kind of resources to fulfill HIC information needs.
Drug interaction: to ask for more details about unfavorable and unexpected signs, symptoms, or diseases associated with the use of a drug without any judgment about the causality or relationship to drug use.
Similar experiences: to connect to patients with similar conditions.
Diagnosing and treating a disease or condition is an ongoing process. HICs at different stages in this process often have different levels of information needs [
Multimorbidity is strongly associated with chronic disease. Accordingly, we also drew on the stages of chronic illness in understanding the questions of HICs who are chronically ill. To this end, stages of disease development were extended to include the chronic illness trajectory framework [
HICs provide demographic and medical information in their questions that represent their understanding of their diseases and communicate their information needs to others [
To facilitate question analysis, we developed a web-based annotation system. Two annotators coded the questions independently, with one having a medical degree. Each coder was asked to determine whether any of the categories in the coding schema was present or absent in the question. The questions were split randomly between the coders with 20% overlap to check for intercoder agreement. A comparison between the two sets of coding results showed that the intercoder agreement over the overlapping data was 87.9%. Discrepancies in the coding results were discussed and resolved among the coders, who then used the results of this discussion to review and revise the overall questions until they reached an agreement.
Descriptive statistics for the data sets are presented in
Descriptive statistics of the data sets and analysis of variance results of disease complexity.
Variable | Single (n=531; 85.1%), mean (SD) | Multimorbidity (n=93; 14.9%), mean (SD) | |||
|
|||||
|
Symptom | 0.12 (0.33) | 0.08 (0.27) | 1.72 (1,622) | .19 |
|
Cause | 0.19 (0.39) | 0.14 (0.35) | 1.16 (1,622) | .28 |
|
Diagnose | 0.22 (0.41) | 0.12 (0.32) | 5.08 (1,622) | .02a |
|
Treatment | 0.16 (0.37) | 0.26 (0.44) | 4.82 (1,622) | .02a |
|
Prognoses | 0.13 (0.33) | 0.17 (0.38) | 1.31 (1,622) | .25 |
|
Drug | 0.04 (0.19) | 0.06 (0.25) | 1.42 (1,622) | .23 |
|
Lifestyle | 0.13 (0.34) | 0.15 (0.36) | 0.29 (1,622) | .59 |
|
Similar | 0.03 (0.17) | 0.05 (0.23) | 1.66 (1,622) | .19 |
|
Source | 0.06 (0.23) | 0.09 (0.28) | 1.03 (1,622) | .31 |
|
Others | 0.23 (0.42) | 0.28 (0.45) | 1.26 (1,622) | .26 |
|
|||||
|
Health stages of questions | 3.92 (3.43) | 6.51 (2.59) | 48.02 (1,622) | <.001b |
|
Chronic stages | 1.04 (1.6) | 2.4 (1.87) | 54.01 (1,622) | <.001b |
|
|||||
|
Demographic | 0.21 (0.41) | 0.48 (0.5) | 32.24 (1,622) | <.001b |
|
Medical diagnosis | 0.4 (0.49) | 0.58 (0.5) | 11.04 (1,622) | <.001b |
|
Treatment and prevention | 0.25 (0.43) | 0.44 (0.5) | 14.55 (1,622) | <.001b |
aSignificant as
bSignificant as
The ANOVA results (
The post-hoc comparison showed that the single-disease questions tend to include more diagnostic information (mean 0.22, SD 0.41;
There were statistically significant differences in the health stage (
The ANOVA results revealed the effects of disease complexity on the types of information shared, including demographic information (
Illustrations of information seeking and sharing behavior in single-disease health questions.
Illustrations of information seeking and sharing behavior in multi-disease health questions.
This research aims to investigate whether disease complexity affects information-seeking needs, stages of disease development, and the type of shared information on Q&A sites. Our empirical data analysis results revealed several significant effects.
Questions relating to single diseases were more likely to include questions about diagnosis when compared with multimorbid questions. This is not in line with medical care assumptions that underdiagnoses are a real threat to managing patients with multiple chronic conditions [
Another major finding is the disease stage. Multimorbid questions focused more frequently on advanced stages of disease development. We provide several alternative explanations for this finding. First, HICs at a later stage of a chronic disease may need more information to validate their choices, such as looking for detailed clinical information, which experts cannot do for various reasons. Patients might understand that care is patient centered, and it is common for treatments to differ, especially across different states or countries. In addition, patients at later stages may require more general information, such as lifestyle, for which peers can serve as great information sources.
Disease complexity was found to play a major role in determining the types of information shared on web-based Q&A sites. In particular, the multimorbid HICs included more demographic and medical information in their questions, which included information related to diagnosis and information relating to treatment and prevention. Although the management of chronic disease is a highly collaborative process between patients and providers, the work that patients must take on during the various stages of chronic disease progression is immense [
This study makes several novel scientific contributions to health consumer informatics. To the best of our knowledge, this is the first study to empirically investigate the effects of disease complexity on the types of information shared in two Q&A sites. In addition, this is the first study to examine the disease stages of HICs with respect to disease complexity. Overall, this study provides a comprehensive examination of a wide variety of question characteristics, which is unique among studies on web-based health Q&A sites.
This study used mixed methods that combine qualitative and quantitative methods to understand health information–seeking behavior in web-based Q&A sites. In addition, the study considered two common types of Q&A sites, community based and expert curated, for analyzing health information–seeking behavior.
Unlike previous studies that view information sharing and information seeking as two conflicting goals [
Our research findings have several implications for improving web-based Q&A sites. The information needs of HICs with chronic diseases may also change over time as their disease evolves, such as substantial disruptions to their everyday lives. As a result, effective support from peers who share similar characteristics and experiences would be very helpful. It is worth noting that HICs with a chronic disease grow their knowledge as they continuously manage their conditions and take more responsibility for their illnesses [
This study provides strong evidence that multimorbid HICs share both demographic and medical information when they seek health information on the web. Although web-based Q&A sites encourage the exchange of a significant amount of health information, these sites can benefit from improving the organization of information for community reuse. For instance, these sites could better support HICs with specific information needs by organizing the questions based on specific types of information needs, such as diagnosis, treatment, and side effects, and by encouraging them to share their demographic and medical information without compromising their personal privacy.
Our study has several limitations. As far as the data sources are concerned, we collected health questions from two types of Q&A sites. Thus, caution should be exercised when generalizing the findings to other Q&A sites. For instance, Twitter has been used for Q & A in the health context. It would help enrich the Q&A literature to build a tweets data set on health Q&A and use it to validate the findings of this study. In addition, multiple disease questions had a much smaller proportion as opposed to a single-disease question in this study. Furthermore, we have chosen to focus on health topics on the kidney disease. Although the findings of this study are expected to be extensible to other chronic diseases, they still require empirical validations. In addition to the health questions, which is the focal point of the analysis in this study, the Q&A sites also provide other types of information such as user profiles, comments, answers, and ratings. Integrating information from these multiple dimensions is expected to achieve a deeper understanding of the web-based behavior of HICs. On the other hand, the disclosures of increasing amount of personal health information may raise privacy concerns. Thus, how HIC trade-off between information needs and information disclosure is an interesting question for future research. The qualitative approach used in this study helps uncover the contextual characteristics of questions, which however is difficult to scale. Text mining has the potential to address the limitation by automating this process. The data generated through this study can serve as the training data for building text-mining models.
Multiple disease or multimorbidity questions seem to play a major role in the stages of disease development and types of information shared, highlighting a deeper understanding of the complexities of their conditions. Regarding the types of information needs, multimorbidity has a minor implication related to treatment. It is also the case of single-disease questions that seem to be relevant only for types of information needs in terms of diagnosis. This study is a valuable first step in investigating the effects of multimorbidity on different types of information shared in two Q&A sites. The findings present implications for designing web-based Q&A sites to better support health information seeking.
analysis of variance
health information consumer
question and answer
AA developed the study, analyzed the data, and wrote the manuscript. LZ reviewed the manuscript and made the necessary edits.
None declared.