Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveillance
Background: Alternative data sources are used increasingly to augment traditional public health surveillance systems. Examples include over-the-counter medication sales and school absenteeism.
Objective: We sought to determine if an increase in restaurant table availabilities was associated with an increase in disease incidence, specifically influenza-like illness (ILI).
Methods: Restaurant table availability was monitored using OpenTable, an online restaurant table reservation site. A daily search was performed for restaurants with available tables for 2 at the hour and at half past the hour for 22 distinct times: between 11:00 am-3:30 pm for lunch and between 6:00-11:30 PM for dinner. In the United States, we examined table availability for restaurants in Boston, Atlanta, Baltimore, and Miami. For Mexico, we studied table availabilities in Cancun, Mexico City, Puebla, Monterrey, and Guadalajara. Time series of restaurant use was compared with Google Flu Trends and ILI at the state and national levels for the United States and Mexico using the cross-correlation function.
Results: Differences in restaurant use were observed across sampling times and regions. We also noted similarities in time series trends between data on influenza activity and restaurant use. In some settings, significant correlations greater than 70% were noted between data on restaurant use and ILI trends.
Conclusions: This study introduces and demonstrates the potential value of restaurant use data for event surveillance.
J Med Internet Res 2014;16(1):e22)
Global adoption of the Internet and mobile phone technologies has proven useful for gathering and disseminating data. Various novel data streams using these technologies have been explored as tools for augmenting traditional public health disease surveillance systems. These novel systems typically aim to improve the detection and monitoring of outbreaks in addition to disseminating information to the public and to public health professionals. Examples include search query volume , and digital surveillance systems harnessing news reports and social media, such as HealthMap [ , ] and Global Public Health Intelligence Network (GPHIN) [ ]. Other innovative surveillance systems have explored the use of over-the-counter medication sales [ ], telephone triage records [ ], and school absenteeism [ ]. The usefulness of these alternative data sources has been evaluated in several studies, especially for monitoring seasonal and pandemic influenza (eg, Besculides et al [ ], Vergu et al [ ], Yih et al [ ], and Bernardo et al [ ]). Another example includes the use of digital surveillance systems to mine early reports of an outbreak of acute respiratory infections, which later evolved into a pandemic in 2009 [ ]. Similarly, during the recent 2013 H7N9 influenza outbreak in China, social media sites, such as Twitter and Sina Weibo (a Chinese social network site similar to Twitter), provided near real-time information on disease activity [ ].
Similar to school absenteeism, over-the-counter medication sales, and volume of telephone triage service data, utilization of online restaurant reservation sites could also serve as a tool for event surveillance. Studies have noted that the percentage of meals consumed outside the home in the United States has increased [- ]. Therefore, monitoring changes in restaurant use could possibly serve as a leading indicator of disruption resulting from social unrest, including a public health event. In particular, a decrease in restaurant use could serve as an early indicator of a disease-related event. In this study, we evaluated whether a rise in restaurant table availability was associated with an increase in influenza-like illness (ILI).
All data on restaurant use were obtained from OpenTable , an online platform where individuals can make table reservations at restaurants with availabilities at different times of the day. The site caters to restaurants in various cities in several different countries with more than 28,000 restaurants in the database at the time of this writing. The number of registered restaurants is available for each city/region and varies over time as new restaurants join and existing ones either close or cancel their registration.
Each day from September 4, 2012 to April 30, 2013, at set times around lunch and dinner, we conducted a search to determine the number of restaurants with tables available for 2 people. To accommodate differences in regions and eating habits, we defined the lunch period as between 11:00 am-3:30 pm and dinner as between 6-11:30 pm. According to OpenTable policy, customers can cancel reservations up to 30 minutes before the reserved time. Therefore, we searched for restaurants with table availabilities 15 minutes before the times of interest. So, for reservations at 2:00 pm, we would search for available tables every day at 1:45 pm. In addition, we searched for restaurants with table availabilities at the hour and at half past the hour. This resulted in 20 distinct search times each day for each of the 10 study regions in the United States and Mexico. In the United States, table availability was examined for restaurants in Boston (Massachusetts), Atlanta (Georgia), Baltimore (Maryland), and Miami (Florida). For Mexico, we monitored table availabilities in Cancun (Quintana Roo), Mexico City (Distrito Federal), Puebla (Puebla), Monterrey (Nuevo Léon), Guadalajara (Jalisco), and the whole of Mexico. Since data were collected every day at the specified times, our observations formed time series curves of availabilities. Monitoring 10 regions at 20 search times resulted in 200 distinct time series.
Comparison to Data on Influenza Activity
By using data from the recent 2012-2013 severe influenza season, we tested the hypothesis that an increase in influenza activity was associated with a rise in restaurants with table availabilities. Since the number of restaurants in the system varied over time, we focused on the proportion of restaurants with available tables. For each region, the proportion of restaurants with available tables was defined as the number of restaurants with availabilities at time t divided by the total number of restaurants on OpenTable at time t. First, we examined the data to better understand trends in table availability during the baseline period. The baseline period was from September to October 2012 because influenza season typically runs from November to April in the northern hemisphere . Observations on restaurant use during the baseline period could suggest best times for surveillance. Next, we calculated the average weekly proportion of restaurants with table availability at each sampling time and compared these data to weekly estimates of ILI. Cross-correlations between time series could be affected by bias because of temporal autocorrelation [ ]. Bias in this study could be due to low-frequency patterns resulting from fewer numbers of restaurants open at particular hours of the day. Therefore, we applied prewhitening by fitting an autoregressive integrated moving average (ARIMA) model to availabilities, and then filtering the ILI values using the fitted model. The ARIMA model can be described as follows:
where c is a constant, yt is the observation at time t, yt–p are lagged values of the series, and zt is a white noise process. Correlations were then examined between the residuals of the availabilities model and the filtered ILI values using the cross-correlation function (CCF). Data representing ILI activity was obtained from state surveillance systems [- ], Google Flu Trends [ ] and the Pan American Health Organization (PAHO) [ ]. Google Flu Trends data was available at the city level for cities in the United States and at the province level for Mexico. We calculated correlations between city-level Google Flu Trends and availabilities data for cities in the United States. Due to the unavailability of data at the city level, we estimated correlations between Google Flu Trends state-level data and availabilities for the various cities in Mexico. PAHO’s estimated percent positive for influenza data was only available at the country level for Mexico. Weekly percent ILI (% ILI), resulting from physician visits was also available for all states in the United States. Additionally, for illustrative purposes, local polynomial regression fitting (LOESS) was used in smoothing curves presented in the Results. Smoothing was performed to capture the overall trend of the curves for comparison purposes. See Cleveland [ ] for additional information on the LOESS model. Bonferroni adjustment was also applied to account for multiple comparisons as needed. The analysis was performed in R (The R Foundation for Statistical Computing, Wien, Austria).
Data Summary: Baseline Period
Table availabilities varied by mealtime (lunch and dinner) and by city. Times with lowest availabilities in some cities could represent preferred dining times. Trends observed during preferred dining times should also be most affected by seasonal deviations in dining. Hereafter, we refer to times with lowest availabilities as “most preferred” and times with highest availabilities as “least preferred.” The most- and least-preferred dining times varied by region. Use of t tests suggested significant differences (P<.001) in availabilities between most- and least-preferred times across all regions. Additionally, mean availabilities at the most- and least-preferred times were also significantly different at lunch and dinner for most regions (P<.05), except for lunch in Monterrey (P=.23) and Puebla (P=.99). Times with the lowest mean proportion of restaurants with available tables at lunch were 2:00 pm (mean 0.608, SD 0.005) for Atlanta, Miami at noon (mean 0.673, SD 0.016), Boston at noon (mean 0.594, SD 0.009), Baltimore at noon (mean 0.589, SD 0.007), Mexico City at 3:30 pm (mean 0.757, SD 0.044), Cancun at 12:30 pm (mean 0.709, SD 0.048), Guadalajara at 3:30 pm (mean 0.818, SD 0.06), Monterrey at noon (mean 0.824, SD 0.055), Puebla at noon (mean 0.963, SD 0.035), and Mexico at 3:30 pm (mean 0.753, SD 0.035). All regions had the lowest proportion of restaurants with available tables for dinner between 10:30-11:30 pm, which could be because of fewer restaurants being open later in the night, especially for US cities. The mean proportion of restaurants with table availability for the entire study period (September 2012-April 2013) is summarized in.
Comparison to Data on Influenza Activity
We calculated 300 correlations between data representing ILI activity and restaurant use. Significant correlations (P<.05) between the restaurant table availability data for each region and percent positive for influenza from PAHO, percent ILI from states in the United States, and Google Flu Trends are summarized in. Correlations presented in were calculated from prewhitened time series data. We present correlations at weekly lags of 0 and 1 because significant correlations at lag 0 suggest that increase in ILI were associated with immediate increase in availabilities. On the other hand, significant correlations at a 1-week lag indicate that increases in availabilities were followed by a noted increase in ILI. Understanding the correlation at different lags can inform the use of restaurant utilization data for modeling and forecasting future ILI activity. We present results only at lag 1 because the data are at a weekly resolution and studies have suggested that the mean duration of influenza symptoms is presumably less than 7 days [ ].
As noted, all regions had the lowest proportion of restaurants with available tables for dinner between 10:30-11:30 pm during the baseline period. As seen in, these were also the times with the most significant correlations for Miami, Mexico City, and Mexico. Note that most of the dinnertime correlations for Atlanta and Baltimore were observed earlier in the evening compared to Miami. Overall, the highest correlations (>70%) were recorded for Atlanta at 12:30 pm and 3:00 pm at lag 0, and Baltimore at 7 pm at 1-week lag. More significant correlations were also noted between restaurant table availability and Google Flu Trend data compared to PAHO percent positive for influenza and state ILI data (see ). Significant correlations were recorded between PAHO data and restaurant table availabilities for Mexico at 4 set times. As seen in , a few correlations were also observed between restaurant table availabilities in US cities and state ILI percentages. State ILI trends are not always identical to trends at the city level, which could explain the lack of significant correlations. No significant correlations were noted for Cancun, Guadalajara, Puebla, and Boston.
We present a sample plot showing a case in which the trend in restaurant table availability appears similar to the trend in estimated ILI activity for Miami in. Note the dips in the curves before the peak observed during the weeks of Thanksgiving and Christmas holidays. The drop observed after the peak occurred during the week of Valentine’s Day. As seen in , the overall trend in the data for availabilities at 10 pm for Miami was similar to that observed with Google Flu Trends, with the peak observed later. This could suggest that an increase in availabilities was observed after a rise in ILI-related queries. Further, 9 graphs representing trends observed during sampling times with the highest correlations for Baltimore, Atlanta, and Mexico are shown in . Note the similarities and differences between the curves. In some cases, such as G-I in , several of the peaks and troughs in the influenza data are captured by the data on restaurant utilization.
In this paper, we introduced an easily accessible Internet-based data stream—online restaurant reservations—and demonstrated its potential value for event surveillance. More specifically, we observed significant correlations between restaurant table availability and Google Flu Trends, and influenza activity data at the city and country level in the United States and Mexico. In most cases, associations between restaurant use and measures of influenza activity were stronger when all data were at the same geographic resolution. For instance, correlations between restaurant use in Miami and Google Flu Trends data for Miami were stronger than correlations between ILI data for Florida and restaurant use data for Miami. This tendency is explained at least in part by the known variation in ILI trends when measured at different geographical resolutions (eg, city, state, country) [, ]. We also observed the highest correlations (>70%) for Atlanta at 12:30 pm and 3:00 pm, and for Baltimore at 7 pm. Dinner times with significant correlations were observed later in the evening for regions in the south compared to others. These differences in observations across regions could potentially be explained by differences in dining habits and demographic differences, which could also affect both dining habits and influenza spread [ , ]. Other potential factors include social disruptions, socioeconomic factors, natural disasters, foodborne illnesses, etc. In addition to modeling the data for providing estimates of ILI before the release of official reports, in future studies we would also investigate any occurrences of social unrest and natural disasters, which might have affected the trend in the time series.
Although possibly useful, there are several limitations inherent to the study and the data source impeding a full exploration of this system’s potential. Limitations include differentiating between seasonal changes and changes potentially resulting from a disease-related event. In addition, data on reservation cancellations would be most suitable. However, these data are currently unavailable. This issue could be easily remediated by creating a partnership with restaurants such that occurrences of and reasons for cancellations are recorded through a survey system. Furthermore, only samples of restaurants in each region are listed on OpenTable.
Despite these limitations, this preliminary analysis suggests that monitoring trends in restaurant table availabilities and cancellations could be useful for detecting social disruption, including disease-related events. Moreover, unlike school absenteeism, over-the-counter medication sales, and volume of telephone triage service data, which are traditionally difficult to access, reservation use data can easily be obtained from reservation sites. The global penetration of the Internet also suggests that such data sources could be easily harvested in the future. These novel data sources could serve as a stepping-stone to prompt further investigation of disease events if warranted. Observations made using this data can be further investigated by comparing trends to other alternative sources for disease surveillance, especially in situations where official reports on disease activity are delayed. Additionally, this data source can be fused with more traditional data streams for epidemic intelligence using ensemble modeling approaches.
We thank Sumiko Mekaru for suggestions regarding data analysis. This work is partially supported by a research grant the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) contract number D12PC000337. The US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the US Government.
All authors conceived and designed the study. Elaine O Nsoesie analyzed the data. All authors drafted, edited, and approved the final version of the paper. The sponsors had no role in the design, analysis, and writing of this manuscript.
Conflicts of Interest
- Polgreen PM, Chen Y, Pennock DM, Nelson FD. Using internet searches for influenza surveillance. Clin Infect Dis 2008 Dec 1;47(11):1443-1448 [FREE Full text] [CrossRef] [Medline]
- HealthMap. URL: http://healthmap.org/en/ [accessed 2013-11-27] [WebCite Cache]
- Brownstein JS, Freifeld CC, Madoff LC. Digital disease detection--harnessing the Web for public health surveillance. N Engl J Med 2009 May 21;360(21):2153-5, 2157 [FREE Full text] [CrossRef] [Medline]
- Mykhalovskiy E, Weir L. The Global Public Health Intelligence Network and early warning outbreak detection: a Canadian contribution to global public health. Can J Public Health 2006 Jan;97(1):42-44. [Medline]
- Magruder SF. Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins University APL Technical Digest 2003;24:349-353.
- Espino JU, Hogan WR, Wagner MM. Telephone triage: a timely data source for surveillance of influenza-like diseases. AMIA Annu Symp Proc 2003:215-219 [FREE Full text] [Medline]
- Peterson D, Andrews JS, Levy BS, Mitchell B. An effective school-based influenza surveillance system. Public Health Rep 1979;94(1):88-92 [FREE Full text] [Medline]
- Besculides M, Heffernan R, Mostashari F, Weiss D. Evaluation of school absenteeism data for early outbreak detection, New York City. BMC Public Health 2005;5:105 [FREE Full text] [CrossRef] [Medline]
- Vergu E, Grais RF, Sarter H, Fagot JP, Lambert B, Valleron AJ, et al. Medication sales and syndromic surveillance, France. Emerg Infect Dis 2006 Mar;12(3):416-421 [FREE Full text] [CrossRef] [Medline]
- Yih WK, Teates KS, Abrams A, Kleinman K, Kulldorff M, Pinner R, et al. Telephone triage service data for detection of influenza-like illness. PLoS One 2009;4(4):e5260 [FREE Full text] [CrossRef] [Medline]
- Bernardo TM, Rajic A, Young I, Robiadek K, Pham MT, Funk JA. Scoping review on search queries and social media for disease surveillance: a chronology of innovation. J Med Internet Res 2013;15(7):e147 [FREE Full text] [CrossRef] [Medline]
- Brownstein JS, Freifeld CC, Madoff LC. Influenza A (H1N1) virus, 2009--online monitoring. N Engl J Med 2009 May 21;360(21):2156 [FREE Full text] [CrossRef] [Medline]
- Salathé M, Freifeld CC, Mekaru SR, Tomasulo AF, Brownstein JS. Influenza A (H7N9) and the importance of digital epidemiology. N Engl J Med 2013 Aug 1;369(5):401-404. [CrossRef] [Medline]
- Poti JM, Popkin BM. Trends in energy intake among US children by eating location and food source, 1977-2006. J Am Diet Assoc 2011 Aug;111(8):1156-1164 [FREE Full text] [CrossRef] [Medline]
- Nielsen SJ, Siega-Riz AM, Popkin BM. Trends in food locations and sources among adolescents and young adults. Prev Med 2002 Aug;35(2):107-113. [Medline]
- Kant AK, Graubard BI. Eating out in America, 1987-2000: trends and nutritional correlates. Prev Med 2004 Feb;38(2):243-249. [Medline]
- OpenTable. URL: http://www.opentable.com/ [accessed 2013-11-27] [WebCite Cache]
- Truscott J, Fraser C, Cauchemez S, Meeyai A, Hinsley W, Donnelly CA, et al. Essential epidemiological mechanisms underpinning the transmission dynamics of seasonal influenza. J R Soc Interface 2012 Feb 7;9(67):304-312 [FREE Full text] [CrossRef] [Medline]
- Bloom RM, Buckeridge DL, Cheng KE. Finding leading indicators for disease outbreaks: filtering, cross-correlation, and caveats. J Am Med Inform Assoc 2007;14(1):76-85 [FREE Full text] [CrossRef] [Medline]
- Massachusetts Department of Public Health Weekly Influenza Update. 2013 May 24. URL: http://ma-publichealth.typepad.com/files/weekly_flu_report_5-24-13.pdf [accessed 2013-11-29] [WebCite Cache]
- Influenza in Maryland 2012-2013 Season Report. URL: http://phpa.dhmh.maryland.gov/influenza/fluwatch/Past%20Flu%20Season%20Summaries/FINAL%20FLU%20REPORT%202012_13_9SEP13_Final.pdf [accessed 2013-11-27] [WebCite Cache]
- Weekly Influenza Surveillance in Georgia, 2012-2013. URL: http://health.state.ga.us/epi/flu/fluupd11.asp [accessed 2013-11-27] [WebCite Cache]
- Florida Flu Review 2012-2013 season. URL: http://www.floridahealth.gov/diseases-and-conditions/influenza/_documents/week17.pdf [accessed 2013-11-27] [WebCite Cache]
- Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature 2009 Feb 19;457(7232):1012-1014. [CrossRef] [Medline]
- Pan American Health Organization. URL: http://ais.paho.org/phip/viz/ed_flu.asp [accessed 2013-11-27] [WebCite Cache]
- Cleveland WS. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 1979 Dec;74(368):829-836. [CrossRef]
- Carrat F, Vergu E, Ferguson NM, Lemaitre M, Cauchemez S, Leach S, et al. Time lines of infection and disease in human influenza: a review of volunteer challenge studies. Am J Epidemiol 2008 Apr 1;167(7):775-785 [FREE Full text] [CrossRef] [Medline]
- Viboud C, Bjørnstad ON, Smith DL, Simonsen L, Miller MA, Grenfell BT. Synchrony, waves, and spatial hierarchies in the spread of influenza. Science 2006 Apr 21;312(5772):447-451 [FREE Full text] [CrossRef] [Medline]
- Brownstein JS, Wolfe CJ, Mandl KD. Empirical evidence for the effect of airline travel on inter-regional influenza spread in the United States. PLoS Med 2006 Sep;3(10):e401 [FREE Full text] [CrossRef] [Medline]
- Opatowski L, Fraser C, Griffin J, de Silva E, Van Kerkhove MD, Lyons EJ, et al. Transmission characteristics of the 2009 H1N1 influenza pandemic: comparison of 8 Southern hemisphere countries. PLoS Pathog 2011 Sep;7(9):e1002225 [FREE Full text] [CrossRef] [Medline]
- Merler S, Ajelli M. The role of population heterogeneity and human mobility in the spread of pandemic influenza. Proc Biol Sci 2010 Feb 22;277(1681):557-565 [FREE Full text] [CrossRef] [Medline]
|ARIMA: autoregressive integrated moving average|
|CCF: cross-correlation function|
|IARPA: Intelligence Advanced Research Projects Activity|
|ILI: influenza-like illness|
|PAHO: Pan American Health Organization|
Edited by G Eysenbach; submitted 30.09.13; peer-reviewed by Z Huang, J Schwind; comments to author 30.10.13; revised version received 02.12.13; accepted 30.12.13; published 22.01.14
©Elaine O Nsoesie, David L Buckeridge, John S Brownstein. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 22.01.2014.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.