Published on in Vol 21, No 4 (2019): April

Preprints (earlier versions) of this paper are available at, first published .
Symptoms Prompting Interest in Celiac Disease and the Gluten-Free Diet: Analysis of Internet Search Term Data

Symptoms Prompting Interest in Celiac Disease and the Gluten-Free Diet: Analysis of Internet Search Term Data

Symptoms Prompting Interest in Celiac Disease and the Gluten-Free Diet: Analysis of Internet Search Term Data

Authors of this article:

Benjamin Lebwohl1 Author Orcid Image ;   Elad Yom-Tov2, 3 Author Orcid Image

Original Paper

1Celiac Disease Center, Columbia University, New York, NY, United States

2Microsoft Research, Herzeliya, Israel

3Technion, Haifa, Israel

Corresponding Author:

Elad Yom-Tov, PhD

Microsoft Research

13 Shenkar Street

Herzeliya, 46733


Phone: 972 747111359


Background: Celiac disease, a common immune-based disease triggered by gluten, has diverse clinical manifestations, and the relative distribution of symptoms leading to diagnosis has not been well characterized in the population.

Objective: This study aimed to use search engine data to identify a set of symptoms and conditions that would identify individuals at elevated likelihood of a subsequent celiac disease diagnosis. We also measured the relative prominence of these search terms before versus after a search related to celiac disease.

Methods: We extracted English-language queries submitted to the Bing search engine in the United States and identified those who submitted a new celiac-related query during a 1-month period, without any celiac-related queries in the preceding 9 months. We compared the ratio between the number of times that each symptom or condition was asked in the 14 days preceding the first celiac-related query of each person and the number of searches for that same symptom or condition in the 14 days after the celiac-related query.

Results: We identified 90,142 users who made a celiac-related query, of whom 6528 (7%) exhibited sustained interest, defined as making a query on more than 1 day. Though a variety of symptoms and associated conditions were also queried before a celiac-related query, the maximum area under the receiver operating characteristic curve was 0.53. The symptom most likely to be queried more before than after a celiac-related query was diarrhea (query ratio [QR] 1.28). Extraintestinal symptoms queried before a celiac disease query included headache (QR 1.26), anxiety (QR 1.10), depression (QR 1.03), and attention-deficit hyperactivity disorder (QR 1.64).

Conclusions: We found an increase in antecedent searches for symptoms known to be associated with celiac disease, a rise in searches for depression and anxiety, and an increase in symptoms that are associated with celiac disease but may not be reported to health care providers. The protean clinical manifestations of celiac disease are reflected in the diffuse nature of antecedent internet queries of those interested in celiac disease, underscoring the challenge of effective case-finding strategies.

J Med Internet Res 2019;21(4):e13082



Celiac disease is a multisystem immune-based enteropathy characterized by autoantibodies to tissue transglutaminase and villous atrophy that is triggered by the ingestion of gluten in genetically predisposed individuals [1]. Present in nearly 1% of the US population, the seroprevalence of celiac disease has risen markedly in recent decades [2]. However, this rise in celiac disease prevalence has been eclipsed by a more dramatic rise in the avoidance of gluten among individuals who do not have celiac disease or have not been tested for celiac disease [3]. The reasons for gluten avoidance are manifold and include gastrointestinal symptoms [4], the avoidance of cardiometabolic complications [5], cognitive health [6], and treatment of autoimmune disease [7].

Although classical celiac disease consists of a malabsorption phenotype characterized by diarrhea and weight loss, the majority of patients with celiac disease are diagnosed with a nonclassical presentation that includes a heterogeneous set of symptoms and signs including osteoporosis, anemia, abnormal liver enzymes, neuropathy, infertility, and others [8]. Given these diverse clinical manifestations and the lack of signs and symptoms that are sensitive and specific for celiac disease, diagnosis can be elusive, and there is frequently a long delay between the onset of symptoms and the diagnosis of celiac disease. In 1 survey of adults in the United States, patients with celiac disease had symptoms for a mean of 11 years before diagnosis [9]. Patients with extraintestinal symptoms tend to have a longer diagnostic delay compared with those with intestinal symptoms [10].

Aside from clinical examination, researchers have used electronic medical records to screen for celiac disease, showing that analysis of free text included therein could aid in the early discovery of celiac disease [11]. Recently, another novel method for the ascertainment of symptoms preceding diagnoses has been proposed through the analysis of search engine queries [12]. For instance, search engine query analysis has been used to identify symptoms of pancreatic adenocarcinoma months before the disease diagnosis [13]. In this study, we examined search query data with the aim of identifying symptoms and conditions that are associated with a subsequent query for celiac disease and celiac disease–specific searches. We hypothesized that searches related to the modes of presentation of celiac disease would precede searches for celiac disease and/or the gluten-free diet. We aimed to measure the relative prominence of these search terms before versus after a search related to celiac disease. We also aimed to identify a set of symptoms and conditions that would identify individuals at elevated likelihood of a subsequent celiac disease diagnosis.

We extracted all English-language queries submitted to the Bing search engine between January 1, 2017, and October 31, 2017, by people in the United States. For each query, we extracted the time and date of the query, its text, an anonymous user identifier, and the zip code of the asker. Bing data are estimated to be a representative sample of US internet users [14].

Celiac-related queries (CRQs) were those queries that contained the words “celiac” or “gluten.” The queries were filtered to include only those queries by users who were active since at least September 1, 2017, and used CRQs during the month of October 2017, but not in the previous 9 months.

In addition, we identified queries that could indicate celiac disease by finding those queries that contained 1 or more of the following terms: marsh score, duodenal biopsy, intestinal biopsy, beyond celiac, celiac disease foundation, tissue transglutaminase (also as an acronym: TTG), gliadin antibody, celiac clinical trials, celiac trials, or gluten trials.

Some people identified themselves in their queries as having celiac disease, through queries such as “I have celiac disease, can I eat rice?”. To identify these self-identified users [15], we found all mentions of “I have celiac” or “I was diagnosed with celiac” and manually inspected each to exclude irrelevant queries (eg, “do I have celiac?”). Obviously, not all people who have celiac identify themselves in their queries, but this subset of the population is likely composed of celiac patients, and we calculated the prevalence of CRQs in this subset.

We defined people with a passing interest in celiac disease as those who made CRQs during only 1 day. This contrasts with people with a sustained interest who made CRQs over more than 1 day.

Symptoms mentioned in queries were identified by matching the text of queries to a list of 195 symptoms and their synonyms, as developed by Yom-Tov and Gabrilovich [16]. Similarly, medical conditions were found by extracting all 5521 diseases and their synonyms that appear in Wikipedia [17].

Recipe searches provide a representative sample of the dietary consumption of individuals [18]. Therefore, to evaluate the changes in diet made by people with CRQs, we followed the methodology used previously [18] to identify queries for recipes and map them to ingredients therein.

We attempted to identify users with sustained interest from all users using their queries. To do this, we represented each user by the number of times they queried for each symptom and each medical condition before the first CRQ. A predictive model using either linear regression or random forest with 50 trees was constructed and tested using 10-fold cross-validation [19].

This study was approved by the Behavioral Sciences Research Ethics Committee of the Technion, approval number 2018-032.

Of 90,142 users with at least 1 CRQ, 83,614 users (93%) were found with passing interest and 6528 (7%) exhibited sustained interest. Of the 6528 people who had a sustained interest in celiac disease, 104 (1.6%) entered at least one celiac indicator (see Table 1) compared with 336 (0.7%) who had a passing interest and 0.001% in the general population of Bing users.

Table 1. Symptoms and conditions that appear with the highest probability in the 14 days before the first celiac-related query compared with the 14 days after it in the entire population and in the sustained user population.
CategoryAll celiac queriesSustained celiac queries
Stomach ache1.57Anxiety1.10
Xerostomia1.49Weight loss1.03
Abdominal pain1.46Pain1.01
ConditionLactose intolerance3.05Autoimmunity3.21
Inflammatory bowel disease2.58Attention-deficit hyperactivity disorder1.64
Peptic ulcer2.22Gastroesophageal reflux disease1.33
Irritable bowel syndrome2.19Asthma1.30
Food intolerance2.10Influenza1.24
Crohn disease2.02Migraine1.22
Digestive disease1.96Colitis1.21
Polycystic ovary syndrome1.95Systemic lupus erythematosus1.13
Peritonitis1.92Alzheimer disease1.13

Only 31 users identified themselves as having celiac disease based on a self-identified query (eg, “I have celiac”). Among people with a sustained interest in celiac disease, 0.12% (8/6528) were those who identified themselves as having celiac disease compared with 0.03% (31/90142) in the general population (a ratio of 3.5). Thus, similar to the findings of Ofran et al [20], a sustained interest in celiac disease can be considered a proxy for having the condition or for a caregiver of a patient.

We attempted to identify users with either a passing or sustained interest in celiac disease, comparing them with all other users based on antecedent queries. In both cases, the area under the receiver operating characteristic curve was 0.53 or less, indicating that we could not distinguish the 2 classes based on symptoms and conditions searched.

To investigate the symptoms most associated with initiation of search for CRQs, we compared the ratio between the number of times that each symptom or condition were asked in the 14 days preceding the first CRQ of each person and the number of times they were mentioned from 14 days before the first CRQ until 14 days after it. Table 1 shows the symptoms and conditions with the highest before-to-after ratio.

Figure 1 shows the fraction of queries for “diarrhea,” the top-ranked symptom among sustained users, over time compared with the fraction of all queries. As the figure shows, interest in this symptom begins to rise only approximately 2 weeks before the first CRQ and rises dramatically in the few days before it.

As noted above, we identified queries for recipes and compared these recipes from 14 days before the first CRQ with recipes from 14 days after it. Table 2 shows the recipes and ingredients that increased in searches and those that decreased.

Figure 1. Fraction of queries for diarrhea over time, compared with the fraction of all queries. Day zero is the first celiac-related query. A ratio greater than 1 indicates that the queries for diarrhea are more common than could be expected. CRQ: celiac-related query.
View this figure
Table 2. Recipes and food ingredients that increased and those that decreased in the 14 days after the first celiac-related query, compared with the preceding 14 days.
FoodsGluten-free pie crustHoney cake
Gluten-free pumpkin breadEarthquake cake
Gluten-free banana breadCucumber salad
Gluten-free peanut butter cookiesEgg salad
Gluten-free chocolate chip cookiesPasta salad
Gluten-free pancakesBroccoli salad
Roasted pumpkin seedsFish tacos
Pumpkin soupRatatouille
Cinnamon rollsTomato pie
Pumpkin muffinsTuna noodle casserole
IngredientsBean flourAll-purpose flour
Brown rice flourDark rum
White rice flourAnisette
Potato starch flourGin
Rice flourSerrano chile
Xanthan gumPeach schnapps
Soy flourCherry
White sugarGelatin
Ground walnutsSunflower seeds

In this analysis of search engine queries, we found that symptoms known to be associated with celiac disease were searched for in the days preceding a first-time search for celiac disease or gluten. The symptom most likely to be queried more before than after a celiac disease query was diarrhea, a common clinical manifestation of celiac disease [21]. Though a variety of other symptoms and associated conditions were also queried before a CRQ, there was no combination of terms that resulted in an area under the curve of high discriminatory value. This lack of a discriminatory symptom set is in contrast to a prior analysis of this search engine investigating symptoms preceding a diagnosis of pancreatic cancer, which found that a set of search terms can identify 5% to 15% of patients with likely pancreatic adenocarcinoma while maintaining a low false-positive rate [13]. The lack of a consistent set of symptoms preceding interest in celiac disease or gluten-related disorders is congruent with a recent study evaluating medical records that found that clinical manifestations and associated diseases were largely ineffective at distinguishing patients with and without celiac disease [22]. Case finding (as opposed to population screening) is a widely accepted approach to identifying patients with celiac disease; however, a symptom-based approach appears to be unable to effectively distinguish patients with celiac disease from the general population, and this is borne out by our analysis.

The majority of patients with celiac disease now present without diarrhea and instead with other intestinal or extraintestinal symptoms [8]. Nevertheless, no single nonclassical manifestation is more common than diarrhea as a presenting feature, and thus, the plurality of patients have diarrhea, which might account for this symptom being the most likely to be mentioned before a CRQ [8]. Among the so-called nonclassical presentations, we found that queries for bloating and gastroesophageal reflux were associated with subsequent sustained queries for celiac disease. Among extraintestinal symptoms, headache, anxiety, depression, and attention-deficit hyperactivity disorder were associated with subsequent CRQs, raising the possibility that neuropsychiatric symptoms are a more prominent set of clinical features in celiac disease than is generally recognized. Patients with celiac disease have a greater risk of health care visits for headache both before and after celiac disease diagnosis [23]. Most studies have also found an association between both anxiety and depression and celiac disease [24], and the presence of depression appears to modify the relationship between adherence to the gluten-free diet and the severity of celiac disease–related symptoms [25]. Our findings suggest that these neuropsychiatric symptoms may be a prominent feature among individuals before they seek celiac disease testing.

In addition to the known intestinal and extraintestinal conditions and associated diseases, our analysis also yielded unexpected associations with subsequent CRQs, including cough, asthma, bleeding, influenza, itch, colitis, and Alzheimer disease (Table 1). Though cough and asthma are not thought to be a common manifestation of celiac disease, patients with celiac disease are somewhat more likely to have asthma [26], and several conditions that feature cough are also associated with celiac disease, including pneumococcal pneumonia and influenza [27,28]. Itch may be associated with celiac-specific queries because of dermatitis herpetiformis, a gluten-induced blistering rash that can be intensely pruritic [29]. Colitis may be associated with celiac-associated queries because of the known association between celiac disease and lymphocytic and collagenous colitis, 2 forms of microscopic colitis that may improve after the adoption of a gluten-free diet [30].

To our knowledge, this is the first study to analyze individual search engine data to identify antecedent symptoms in those who subsequently express an interest in celiac disease. A prior study analyzing regional patterns of Google searches for the gluten-free diet found that location-derived sociodemographic factors such as median income and proportion of residents who are non-Hispanic white were associated with an increased rate of searches for the gluten-free diet as compared with other diets [31]. In this study, we were able to analyze individual-level search data, allowing us to draw inferences about the variety of symptoms that precede awareness of the celiac disease or gluten as a possible underlying cause. The use of search engine queries allows us to evaluate symptoms that may be embarrassing for individuals to report to health care practitioners or on a traditional questionnaire [12]. Another strength of this study was its large sample size, encompassing over 6500 individuals who exhibited a sustained interest by performing a CRQ over more than 1 day.

This study also has a number of limitations. We are unable to distinguish diagnosed celiac disease from those individuals merely suspecting celiac disease. We attempted to mitigate against this by analyzing transient versus persistent interest, as those with persistent interest are more likely to have received a diagnosis of celiac disease; nevertheless, search engine analysis is unable to distinguish celiac disease from nonceliac gluten sensitivity. Though self-identified queries are rare, they can be useful to validate more widely used queries, and we did find that CRQs were highly correlated with a self-identified celiac query (eg, “I have celiac”) when compared with the general population; nevertheless, avoidance of gluten is far more common than diagnosed celiac disease [3]. Individuals with nonceliac gluten sensitivity have beliefs and attitudes that differ from those with celiac disease with regard to the health effects of gluten, the safety of genetically modified organisms, and other issues [32]. Our findings of multiple differences in queried recipes, including a rise in gluten-free baked goods, suggest that regardless of their celiac disease status, users with a CRQ are changing their dietary habits, at least in the short term. Regulations prohibit analysis of search engine query data beyond 18 months, which is considerably shorter than the latency period reported between symptom onset and celiac disease diagnosis (a mean of 11 years) in questionnaire studies [9].

In conclusion, in this analysis of celiac-related internet queries, we found an increase in antecedent searches for symptoms known to be associated with celiac disease such as diarrhea, bloating, and weight loss; a rise in searches for depression and anxiety; and an increase in symptoms that are associated with celiac disease but may not be reported to health care providers. We also found that the protean clinical manifestations of celiac disease are reflected in the diffuse nature of antecedent internet queries, underscoring the challenge of effective case-finding strategies. Future studies should investigate the unexpected associations found with CRQs in this study as well as the prevalence and natural history of neuropsychiatric symptoms in patients at the time of celiac disease diagnosis.

Conflicts of Interest

EYT is an employee of Microsoft, owner of Bing. BL has no conflicts of interest to disclose.

  1. Lebwohl B, Sanders DS, Green PH. Coeliac disease. Lancet 2018 Dec 6;391(10115):70-81. [CrossRef] [Medline]
  2. Rubio-Tapia A, Kyle RA, Kaplan EL, Johnson DR, Page W, Erdtmann F, et al. Increased prevalence and mortality in undiagnosed celiac disease. Gastroenterology 2009 Jul;137(1):88-93 [FREE Full text] [CrossRef] [Medline]
  3. Kim H, Patel K, Orosz E, Kothari N, Demyen MF, Pyrsopoulos N, et al. Time trends in the prevalence of celiac disease and gluten-free diet in the us population: results from the National Health and Nutrition Examination Surveys 2009-2014. JAMA Intern Med 2016 Dec 1;176(11):1716-1717. [CrossRef] [Medline]
  4. Biesiekierski J, Newnham E, Irving P, Barrett JS, Haines M, Doecke JD, et al. Gluten causes gastrointestinal symptoms in subjects without celiac disease: a double-blind randomized placebo-controlled trial. Am J Gastroenterol 2011 Mar;106(3):508-14; quiz 515. [CrossRef] [Medline]
  5. Davis W. Wheat Belly: Lose the Wheat, Lose the Weight, and Find Your Path Back to Health. New York: Rodale; 2011.
  6. David P. Grain Brain: The Surprising Truth about Wheat, Carbs, and Sugar--Your Brain's Silent Killers. New York: Little, Brown And Co; 2019.
  7. Blackett JW, Shamsunder M, Reilly NR, Green PH, Lebwohl B. Characteristics and comorbidities of inpatients without celiac disease on a gluten-free diet. Eur J Gastroenterol Hepatol 2018 Apr;30(4):477-483. [CrossRef] [Medline]
  8. Lo W, Sano K, Lebwohl B, Diamond B, Green PH. Changing presentation of adult celiac disease. Dig Dis Sci 2003 Feb;48(2):395-398. [CrossRef] [Medline]
  9. Green P, Stavropoulos SN, Panagi SG, Goldstein SL, Mcmahon DJ, Absan H, et al. Characteristics of adult celiac disease in the USA: results of a national survey. Am J Gastroenterol 2001 Jan;96(1):126-131. [CrossRef] [Medline]
  10. Paez MA, Gramelspacher AM, Sinacore J, Winterfield L, Venu M. Delay in diagnosis of celiac disease in patients without gastrointestinal complaints. Am J Med 2017 Dec;130(11):1318-1323. [CrossRef] [Medline]
  11. Ludvigsson JF, Pathak J, Murphy S, Durski M, Kirsch PS, Chute CG, et al. Use of computerized algorithm to identify individuals in need of testing for celiac disease. J Am Med Inform Assoc 2013 Dec;20(e2):e306-e310 [FREE Full text] [CrossRef] [Medline]
  12. Yom-Tov E. Crowdsourced Health: How What You Do on the Internet Will Improve Medicine. Cambridge, MA: MIT Press; 2016.
  13. Paparrizos J, White RW, Horvitz E. Screening for pancreatic adenocarcinoma using signals from web search logs: feasibility study and results. J Oncol Pract 2016 Dec;12(8):737-744. [CrossRef] [Medline]
  14. Rosenblum S, Yom-Tov E. Seeking web-based information about attention deficit hyperactivity disorder: where, what, and when. J Med Internet Res 2017 Dec 21;19(4):e126 [FREE Full text] [CrossRef] [Medline]
  15. Yom-Tov E, Borsa D, Cox IJ, McKendry RA. Detecting disease outbreaks in mass gatherings using internet data. J Med Internet Res 2014 Jun 18;16(6):e154 [FREE Full text] [CrossRef] [Medline]
  16. Yom-Tov E, Gabrilovich E. Postmarket drug surveillance without trial costs: discovery of adverse drug reactions through large-scale analysis of web search queries. J Med Internet Res 2013 Jun 18;15(6):e124 [FREE Full text] [CrossRef] [Medline]
  17. Yom-Tov E, Borsa D, Hayward AC, McKendry RA, Cox IJ. Automatic identification of web-based risk markers for health events. J Med Internet Res 2015 Jan 27;17(1):e29 [FREE Full text] [CrossRef] [Medline]
  18. Giat E, Yom-Tov E. Evidence from web-based dietary search patterns to the role of B12 deficiency in non-specific chronic pain: a large-scale observational study. J Med Internet Res 2018 Dec 5;20:e4. [CrossRef] [Medline]
  19. Stork DG, Hart PE, Duda RO. Pattern Classification. New York: Wiley; 1973.
  20. Ofran Y, Paltiel O, Pelleg D, Rowe J, Yom-Tov E. Patterns of information-seeking for cancer on the internet: an analysis of real world data. PLoS One 2012;7:e45921. [CrossRef] [Medline]
  21. Reilly NR, Fasano A, Green PH. Presentation of celiac disease. Gastrointest Endosc Clin N Am 2012 Oct;22(4):613-621. [CrossRef] [Medline]
  22. Hujoel IA, Van Dyke CT, Brantner T, Larson J, King KS, Sharma A, et al. Natural history and clinical detection of undiagnosed coeliac disease in a North American community. Aliment Pharmacol Ther 2018 May;47(10):1358-1366. [CrossRef] [Medline]
  23. Lebwohl B, Roy A, Alaedini A, Green PH, Ludvigsson JF. Risk of headache-related healthcare visits in patients with celiac disease: a population-based observational study. Headache 2016 May;56(5):849-858 [FREE Full text] [CrossRef] [Medline]
  24. Zingone F, Swift G, Card T, Sanders D, Ludvigsson J, Bai J. Psychological morbidity of celiac disease: a review of the literature. United European Gastroenterol J 2015 Apr;3:A. [CrossRef] [Medline]
  25. Joelson A, Geller M, Zylberberg H, Green P, Lebwohl B. The effect of depressive symptoms on the association between gluten-free diet adherence and symptoms in celiac disease: analysis of a patient powered research network. Nutrients 2018 Apr 26;10:E538. [CrossRef] [Medline]
  26. Ludvigsson J, Hemminki K, Wahlstrom J, Almqvist J. Celiac disease confers a 1.6-fold increased risk of asthma: a nationwide population-based cohort study. J Allergy Clin Immunol 2011 Apr;127:1071-1073. [CrossRef] [Medline]
  27. Simons M, Scott-Sheldon L, Risech-Neyman Y, Moss S, Ludvigsson J, Green P. Celiac disease and increased risk of pneumococcal infection: a systematic review and meta-analysis. Am J Med 2018 Jan;131:83-89. [CrossRef] [Medline]
  28. Marild K, Fredlund H, Ludvigsson K. Increased risk of hospital admission for influenza in patients with celiac disease: a nationwide cohort study in Sweden. Am J Gastroenterol 2010 Nov;105:2465-2473. [CrossRef] [Medline]
  29. Reunala T, Salmi T, Hervonen K, Kaukinen K, Collin P. Dermatitis herpetiformis: a common extraintestinal manifestation of coeliac disease. Nutrients 2018 May 12;10:E602. [CrossRef] [Medline]
  30. Stewart M, Andrews C, Urbanski S, Beck P, Storr M. The association of coeliac disease and microscopic colitis: a large population-based study. Aliment Pharmacol Ther 2011;33:1340-1349. [CrossRef] [Medline]
  31. Laszkowska M, Shiwani H, Belluz J, Ludvigsson JF, Green PH, Sheehan D, et al. Socioeconomic vs health-related factors associated with Google searches for gluten-free diet. Clin Gastroenterol Hepatol 2018 Feb;16(2):295-297. [CrossRef] [Medline]
  32. Rabinowitz LG, Zylberberg HM, Levinovitz A, Stockwell MS, Green PH, Lebwohl B. Skepticism regarding vaccine and gluten-free food safety among patients with celiac disease and non-celiac gluten sensitivity. Dig Dis Sci 2018 May;63(5):1158-1164. [CrossRef] [Medline]

CRQ: celiac-related queries
QR: query ratio

Edited by G Eysenbach; submitted 11.12.18; peer-reviewed by F Montanaro, C Elena, R Verma, V Gianfredi; comments to author 31.01.19; revised version received 05.02.19; accepted 11.02.19; published 08.04.19


©Benjamin Lebwohl, Elad Yom-Tov. Originally published in the Journal of Medical Internet Research (, 08.04.2019.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.