This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
The Google Flu Trends service was launched in 2008 to track changes in the volume of online search queries related to flu-like symptoms. Over the last few years, the trend data produced by this service has shown a consistent relationship with the actual number of flu reports collected by the US Centers for Disease Control and Prevention (CDC), often identifying increases in flu cases weeks in advance of CDC records. However, contrary to popular belief, Google Flu Trends is not an early epidemic detection system. Instead, it is designed as a baseline indicator of the trend, or changes, in the number of disease cases.
To evaluate whether these trends can be used as a basis for an early warning system for epidemics.
We present the first detailed algorithmic analysis of how Google Flu Trends can be used as a basis for building a fully automated system for early warning of epidemics in advance of methods used by the CDC. Based on our work, we present a novel early epidemic detection system, called FluBreaks (dritte.org/flubreaks), based on Google Flu Trends data. We compared the accuracy and practicality of three types of algorithms: normal distribution algorithms, Poisson distribution algorithms, and negative binomial distribution algorithms. We explored the relative merits of these methods, and related our findings to changes in Internet penetration and population size for the regions in Google Flu Trends providing data.
Across our performance metrics of percentage true-positives (RTP), percentage false-positives (RFP), percentage overlap (OT), and percentage early alarms (EA), Poisson- and negative binomial-based algorithms performed better in all except RFP. Poisson-based algorithms had average values of 99%, 28%, 71%, and 76% for RTP, RFP, OT, and EA, respectively, whereas negative binomial-based algorithms had average values of 97.8%, 17.8%, 60%, and 55% for RTP, RFP, OT, and EA, respectively. Moreover, the EA was also affected by the region’s population size. Regions with larger populations (regions 4 and 6) had higher values of EA than region 10 (which had the smallest population) for negative binomial- and Poisson-based algorithms. The difference was 12.5% and 13.5% on average in negative binomial- and Poisson-based algorithms, respectively.
We present the first detailed comparative analysis of popular early epidemic detection algorithms on Google Flu Trends data. We note that realizing this opportunity requires moving beyond the cumulative sum and historical limits method-based normal distribution approaches, traditionally employed by the CDC, to negative binomial- and Poisson-based algorithms to deal with potentially noisy search query data from regions with varying population and Internet penetrations. Based on our work, we have developed FluBreaks, an early warning system for flu epidemics using Google Flu Trends.
Infodemiology introduced the use of nontraditional data sources for the detection of disease trends and outbreaks [
In the absence of other real-time disease surveillance mechanisms, services such as Google Flu Trends are vitally important for the early detection of epidemics. Existing research on using Google Flu Trends for epidemic detection has focused on addressing this need by collecting data related to the volume of queries for disease symptoms. This work demonstrates that Google search query trends closely follow the actual disease cases reported by the CDC. While these results provide strong support for the potential use of Google Flu Trends data as a basis for an early warning system for epidemics, existing research needs to be advanced along two essential directions to realize this opportunity. First, there is a need to rigorously explore and evolve algorithms for higher-level inference from the Google Flu Trends data that can generate alerts at early stages of epidemics. In particular, the ability of existing approaches to collect raw search volume data needs to be supplemented with computational intelligence to translate these data into actionable information. Second, there is also a need to develop a more detailed appreciation of how changes in population size and Internet penetration affect the ability of a system based on Google Flu Trends data to provide accurate and actionable information.
In this study, we aimed to provide new insights related to these opportunities. We built upon Google Flu Trends data and compared the accuracy and practicality of widely used algorithms for early epidemic detection. These algorithms are classified into three categories based on the type of data distribution they expect. The classifications in question are normal distribution algorithms, Poisson distribution algorithms, and negative binomial distribution algorithms. For normal distribution algorithms, we used cumulative sum (CUSUM) [
Traditional disease surveillance networks such as the CDC take up to 2 weeks to collect, process, and report disease cases registered at health centers [
Google Flu Trends [
Google Flu Trends compares the popularity of the 50 million most common Google search queries in the United States with flu-like illness rates reported by the CDC’s national surveillance program. The Flu Trends data are derived from a pool of 45 search terms that relate to symptoms, remedies, and complications of flu and generate a trend that closely correlates with CDC data on influenza-like illnesses.
In our experiments, we used Google Flu Trends data from the 9 years between 2003 and 2011.
Information on patient visits to health care providers in the United States for influenza-like illness is collected through the Outpatient Influenza-like Illness Surveillance Network (ILINet). ILINet consists of more than 3000 health care providers in all 50 states, reporting over 25 million patient visits each year. Each week, approximately 1800 outpatient care sites around the United States report data to the CDC on the total number of patients seen and the number of those patients with influenza-like illnesses. For this system, an influenza-like illness is defined as fever (temperature of 100°F [37.8°C] or greater) and a cough or a sore throat in the absence of a known cause other than influenza. Sites with electronic records use an equivalent definition as determined by state public health authorities. The percentage of patient visits to health care providers for influenza-like illnesses reported each week is weighted on the basis of a state’s population. This percentage is compared each week with the national baseline of 2.5%. The baseline is the mean percentage of patient visits for influenza-like illnesses during noninfluenza weeks for the previous three seasons plus 2 standard deviations [
In our experiments, much like Google Flu Trends data, we used CDC influenza-like illness data from the 9 years between 2003 and 2011. Though the CDC has missing data in the nonflu season between 2009 and 2010, we believe this had a minimal effect on our quantitative comparison.
For determining periods of outbreaks, the starting points in the time and the duration, we consulted two epidemiologists from different institutes. The first was from the Institute of Public Health, Lahore, Pakistan (responsible for informing the provincial health ministry about disease outbreaks) and the second was from Quaid-e-Azam Medical College, Bahawalpur, Pakistan. These original outbreaks were marked on CDC influenza-like illness data [
The early epidemic detection algorithms that we have used are divided into three categories, based on the expected distribution in data: (1) normal distribution algorithms: these expect normal distribution in the data, (2) Poisson distribution algorithms: these expect a Poisson distribution, and (3) negative binomial distribution algorithms: these expect a negative binomial distribution.
The algorithms classified in this category are the Early Aberration Reporting System (EARS) algorithm (CUSUM), HLM, and HCusum.
EARS was developed and used by the CDC. EARS comprises three syndromic surveillance early event detection methods called C1, C2, and C3 [
The C1, C2, and C3 EARS algorithms require a baseline (training period) and cut-off (threshold) as parameters. In our experiments, we used both 4 weeks and 8 weeks as the baseline. A shorter training period (baseline) has been shown to insulate CUSUM from seasonal changes [
Since CUSUM uses mean and standard deviation for raising alarms, it is best for outbreaks with respect to normal distribution of data. It means that the algorithm is very sensitive to a sudden rise, which generates an early alarm. In addition, it expects a constant rise in data for an outbreak to continue because the start of a rise becomes part of historical data, consequently also raising the mean and standard deviation for the algorithm.
Early Aberration Reporting System (EARS) algorithm equations. C1 = cumulative sum (CUSUM) score of C1 algorithm, C2 = CUSUM score of C2 algorithm, C3 = CUSUM score of C3 algorithm, sigma = standard deviation, X-bar = mean number of cases, Xn = number of cases in current time interval. Subscripts refer to a specific variable being linked to either of the three algorithms.
The CUSUM methods used in EARS do not account for seasonality by design; however, the HLM incorporates historical data. In HLM an outbreak is signaled when the identity in
In the HLM, the system determines the expected value of a week by (1) using 3 consecutive weeks for every year in the historical data, which is the current week, the preceding week, and the subsequent week (entitled HLM-3), and (2) using 5 consecutive weeks for every year, in the past years: the current week, the preceding 2 weeks, and the subsequent 2 weeks (HLM-5) in the historical data (
The above two variations (HLM-3, which uses 15 baseline points, and HLM-5, which uses 25 baseline points) are recommended by Pelecanos et al [
We used both HLM-3 and HLM-5, in which the training period comprised 5 years, starting from 2003 and ending in 2008. For determining outbreaks within the training period, we removed 1 year at a time from the timeline between 2003 and 2008 (both years inclusive). Then we assumed that the remaining years were consecutive and determined outbreaks during the omitted year by using the remaining 4 years. This process was repeated for every year of the training period.
Just like EARS, HLM runs on the mean and standard deviation of the data. Therefore, the definition of outbreak is to expect a normal distribution and to mark any outlier, according to normal distribution, as an outbreak.
Historical limits method (HLM) equation. Sigma = standard deviation, X = number of reported cases in the current period, X-bar = mean.
Historical data of the historical limits methods (HLM). HLM-3 = 3 consecutive weeks in the historical data, HLM-5 = 5 consecutive weeks in the historical data.
HCusum is a seasonally adjusted CUSUM [
An outbreak is declared if the identity in (c) is true.
Historical cumulative sum (HCUSUM) equations. Sigma = standard deviation, Xn = number of cases in current time interval, X-bar = mean number of cases. N = 5, as the baseline period is the preceding 5 years.
The algorithms classified in this category are POD, SaTScan, and PSC.
The POD method assumes that the number of cases follow a Poisson distribution. The POD method [
We followed certain suggestions from Pelecanos et al [
The SaTScan algorithm can be used for spatial analysis, temporal analysis, and spatiotemporal analysis. We used only the temporal analysis for our outbreak detection, since spatial mapping is already fixed to a CDC-defined region. We used a Poisson permutation, which works best for data following a Poisson distribution. This is the case when the data’s VMR is equal to 1.
Temporal SaTScan creates 1-dimensional clusters by sliding and scaling a window within an interval of 60 days. We relied on the Poisson permutation to determine the clusters with the highest likelihood ratio.
The equation (
Once we had the best cluster within an interval, the algorithm calculated the
SaTScan does not accommodate seasonality. Therefore, to adjust SaTScan for seasonality we scaled the population size of the region under analysis on a weekly basis. SaTScan uses population size as one of the parameters, so for every week the population is scaled. The factor for scaling the population size is dependent on the incidence rate for each week and the annual population:
Moreover, as CDC and Google data are reported on a weekly cycle, we parameterized SaTScan on a weekly time unit. We set the
SaTScan equation. C = total number of cases, Cz = observed number of cases in window z, LLR = likelihood ratio, nz = expected number of cases or population in window z.
PSC is an algorithm that detects the anomalies efficiently in data that follow a Poisson distribution [
An outbreak is signaled when the computed CUSUM score is higher than the threshold
Poisson cumulative sum (CUSUM) equations. k = reference value, Sn = CUSUM score, Xn = number of cases in current time interval, X-bara = null hypotheses mean, X-bard = alternative hypotheses mean, + superscript refers to values always being positive.
This category comprises NBC and historical NBC.
We selected NBC [
The out-of-control level c1 is determined by adding 2 times the standard deviation of the baseline period to the mean of the baseline. We kept a baseline interval of 7 weeks and a guard band of 1 week. The guard band prevents the most recent data from being included in baseline calculations. Therefore, the baseline period and current week will have a gap of 1 week as a guard band. The CUSUM score is compared with the threshold value
Negative binomial cumulative sum (CUSUM) equations. k = reference value, (r,c) = parameters of negative binomial distribution, Sn = CUSUM score, sigma =standard deviation, Xn = number of cases in current time interval, X-bar = mean number of cases, + superscript refers to values always being positive.
NBC with static threshold, although it catches the longevity of the outbreak, is sensitive in raising early alarms. To cater for this sensitivity we introduce variable thresholds for NBC. A new parameter,
Historical NBC is a seasonally adjusted negative binomial CUSUM [
An outbreak is declared if Sn
+ >
Historical negative binomial cumulative sum (CUSUM) equation. k = reference value, Sn = CUSUM score, Xn = the case count of the current week, + superscript refers to values always being positive.
To understand how Google Flu Trends data can be used to build an early epidemic detection system, we compared the results of 24 variants of 8 base algorithms (from three categories of algorithms) across three regions in the United States. To the best of our knowledge, this paper presents the first comparative analysis of epidemic detection algorithms for Google Flu Trends data.
For our base algorithms, we used EARS CUSUM, HCusum, HLM, POD, SaTScan, PSC, NBC, and HNBC. The characteristics of these algorithms afford a degree of diversity in our analysis: EARS CUSUM and NBC were designed for rapid detection of outbreaks; HCusum, HNBC, HLM, and POD incorporate seasonal changes but require a substantial training period; and SaTScan requires minimal training and offers flexibility in detecting statistically significant disease clusters.
We chose the target regions, as divided by the CDC, to compare the sensitivity of the various algorithms to population size and Internet penetration.
For our comparison with respect to population size, we focused on region 4 (with the largest population) and region 10 (smallest population). For evaluating the impact of Internet penetration, we focused on region 6 (lowest Internet penetration) and region 10 (highest Internet penetration). Results from region 10 are of particular interest, since it has the lowest population and highest Internet penetration. We expect that the results from region 10 could serve as a benchmark of how accurately Google Flu Trends data can be used as a basis for detecting epidemics. Furthermore, the weather in regions 4 and 6 was similar but very different from that in region 10.
In our analysis, we evaluated each algorithm by comparing its results using Google Flu Trends data with the disease cases reported by CDC. We compared the performance of the algorithms on the following key metrics.
Percentage true-positives (RTP) measures the percentage of time an epidemic signaled in the CDC data is also detected by the target algorithm on Google Flu Trends data. This percentage is calculated by the number of outbreak intervals when the signal was raised divided by the total number of outbreak intervals, with the result multiplied by 100.
Percentage false-positives (RFP) measures the percentage of time an epidemic not signaled in the CDC data is detected as an epidemic by the target algorithm on Google Flu Trends data. This percentage is calculated by the number of nonoutbreak weeks when a signal was raised divided by the total number of weeks with no outbreak, with the result multiplied by 100.
Percentage overlap (OT) measures the percentage of the time an epidemic detected by an algorithm overlaps with the epidemic signaled in CDC data. Any part of a signal that does not overlap with the original outbreak is not considered in OT.
Percentage early alarms (EA) measures the percentage of time an algorithm raises an alarm on Google Flu Trends before it is signaled as an epidemic by the CDC data. The early alarm period is limited to the 2 weeks before the start of the original outbreak. Part of a signal starting before this 2-week time period is considered false-positive.
These four metrics capture different aspects of the detection algorithms. RTP measures the sensitivity of an algorithm to outbreaks. At the same time, an overly sensitive algorithm generates a higher number of RFPs.
The average overlap time captures the stability of an algorithm to transient changes in the rate of disease cases. Algorithms that signal the entire period of an epidemic are more desirable than those that raise short, sporadic signals.
Finally, algorithms that signal an epidemic ahead of other algorithms are more suited for early epidemic detection. However, this metric must be viewed in conjunction with an algorithm’s RFP to discount algorithms that generate spurious signals. For our analysis, we counted a signal as an early alarm if its fell within a 2-week window preceding the signal in the CDC data, so long as it was not a continuation of a previous alarm.
Population and percentage Internet use by US Department of Health and Human Services (HHS) region.
HHS |
Population |
% Internet |
States |
1 | 14,412,684 | 74.07 | CT, ME, MA, NH, RI, VT |
2 | 28,224,114 | 70.20 | NJ, NY |
3 | 29,479,361 | 69.30 | DE, DC, MD, PA, VA, WV |
4 | 60,088,178 | 63.25 | AL, FL, GA, KY, MS, NC, SC, TN |
5 | 51,745,410 | 71.42 | IL, IN, MI, MN, OH, WI |
6 | 37,860,549 | 61.56 | AR, LA, NM, OK, TX |
7 | 31,840,178 | 71.68 | IA, KS, MO, NE |
8 | 20,802,785 | 72.13 | CO, MT, ND, SD, UT, WY |
9 | 46,453,010 | 67.95 | AZ, CA, HI, NV |
10 | 6,691,325 | 76.93 | AK, ID, OR, WA |
US Department of Health and Human Services regions.
In each Multimedia Appendix there is a sorted column (overall position of algorithm). In this column the algorithms are sorted based on their median across the four performance metrics. We chose the median to cater for extreme values in the performance metrics.
Although we have divided the algorithms into three categories, namely Poisson, negative binomial, and normal distribution algorithms, another subcategory called historical algorithms surfaced during our analysis. This is a subset of both negative binomial and normal distribution categories, as it has algorithms in both. HNBC from the negative binomial and HLM, and HCusum from the normal distribution showed a similar pattern of results across the four performance metrics. Therefore, for the remainder of the discussion, we will add the classification of historical algorithm and analyze its results independently.
In
In the second performance metric, RFP, values go the other way round, with the historical algorithms showing remarkably optimal values (average 3.3%, where lower is better), whereas normal, NBC, and Poisson distribution algorithms show percentages of 11.4%, 28.3%, and 17.5%, respectively. Clearly the historical algorithms and normal distribution algorithms led in this metric.
In the third metric, OT, negative binomial distribution algorithms led, with an OT of 71.3%, followed by Poisson distribution (60.3%), historical algorithms (30.8%), and normal distribution algorithms (16.4%). In this metric, NBC and Poisson distribution led by a major difference, ahead of historical and normal distribution algorithms.
In the fourth and last metric, EA, negative binomial, on average, led with an EA value of 75.8%, followed by Poisson distribution (55.1%), normal distribution (36.8%) and historical algorithms (22.3%).
For some performance metrics, certain categories did not perform consistently, and values of these categories varied over a large range. In normal distribution algorithms, the values of EA varied from 0% to 75%. In Poisson distribution algorithms, EA varied from 13% to 75%. Therefore, in these cases the average value of that particular metric could not be considered representative, and we needed to examine the algorithms (or variations of algorithms) for suitability.
When we looked at EA values in normal distribution algorithms, the C3 variations of EARS showed a high EA value for only one region. Otherwise, the next best values were barely in the optimal range. Moreover, the OT of C3 at best was 34, which is very low and made this algorithm not suitable.
In case of the EA values in Poisson distributions, the SaTScan algorithm pulled the average of Poisson distribution algorithms down in EA. Therefore, if we considered the average EA value of Poisson distribution algorithms without SaTScan, it actually rose from 55.1 to 66.7.
Overall, negative binomial and Poisson distribution algorithms performed much better than normal distribution algorithms. This is mainly because of the data distribution that these algorithms expect. The VMR of seasonal influenza-like illness data was greater than 1, most of the time (
Historical algorithms performed poorly because they considered data during the same period in past years to declare the outbreak. They did not consider the distribution of data during the current year. This made them robust in terms of false-positives, but the performance across other metrics lagged by a substantial difference.
Furthermore, to understand the impact of population variation and change in Internet penetration across regions, we picked the top two algorithms from the negative binomial distribution and Poisson distribution algorithms and applied them to all the regions (instead of just three).
The result of this analysis showed that in regions of high Internet penetration the RFP and OT were high.
Average percentages of various performance metrics for various categories of algorithms.
Metric | Normal | Negative |
Poisson | Historical |
RTPa | 96.4 | 99.0 | 98.8 | 64.0 |
RFPb | 11.4 | 28.3 | 17.5 | 3.3 |
OTc | 16.4 | 71.3 | 60.3 | 30.8 |
EAd | 36.8 | 75.8 | 55.1 | 22.3 |
a Percentage true-positives.
b Percentage false-positives.
c Percentage overlap time.
d Percentage early alarms.
Result of negative binomial cumulative sum (cut-off = 15), for all performance metrics across all Department of Health and Human Services (HHS) regions of the United States.
HHS region | RTPa | RFPb | OTc | EAd |
1 | 100 | 45 | 98 | 87.5 |
2 | 100 | 40 | 85 | 77.7 |
3 | 100 | 40 | 88 | 87.5 |
4 | 100 | 30 | 81 | 88 |
5 | 100 | 40 | 95 | 87.5 |
6 | 100 | 40 | 76 | 88 |
7 | 100 | 40 | 95 | 87.5 |
8 | 87.5 | 50 | 83 | 75 |
9 | 90 | 40 | 71 | 80 |
10 | 100 | 40 | 82 | 71 |
a Percentage true-positives.
b Percentage false-positives.
c Percentage overlap time.
d Percentage early alarms.
Result of negative binomial cumulative sum (threshold = 1 * k) for all performance metrics across all Department of Health and Human Services (HHS) regions of the United States.
HHS region | RTPa | RFPb | OTc | EAd |
1 | 100 | 35 | 87 | 87.5 |
2 | 100 | 27 | 74 | 66.7 |
3 | 100 | 20 | 81 | 75 |
4 | 100 | 20 | 70 | 75 |
5 | 100 | 30 | 86 | 75 |
6 | 100 | 20 | 63 | 75 |
7 | 100 | 30 | 87 | 75 |
8 | 87.5 | 40 | 71 | 75 |
9 | 90 | 30 | 64 | 70 |
10 | 100 | 30 | 68 | 71 |
a Percentage true-positives.
b Percentage false-positives.
c Percentage overlap time.
d Percentage early alarms.
Result of Poisson cumulative sum (threshold = 1 * k) for all performance metrics across all Department of Health and Human Services (HHS) regions of the United States.
HHS region | RTPa | RFPb | OTc | EAd |
1 | 100 | 35 | 83 | 87.5 |
2 | 100 | 27 | 71 | 66.7 |
3 | 100 | 20 | 80 | 75 |
4 | 100 | 20 | 70 | 75 |
5 | 100 | 30 | 84 | 75 |
6 | 100 | 20 | 62 | 75 |
7 | 100 | 30 | 84 | 75 |
8 | 87.5 | 40 | 67 | 75 |
9 | 90 | 30 | 64 | 70 |
10 | 100 | 30 | 68 | 57 |
a Percentage true-positives.
b Percentage false-positives.
c Percentage overlap time.
d Percentage early alarms.
Result of Poisson outbreak detection for all performance metrics across all Department of Health and Human Services (HHS) regions of the United States.
HHS region | RTPa | RFPb | OTc | EAd |
1 | 100 | 35 | 77 | 33 |
2 | 100 | 20 | 70 | 40 |
3 | 100 | 30 | 69 | 50 |
4 | 100 | 20 | 58 | 75 |
5 | 100 | 40 | 72 | 50 |
6 | 100 | 20 | 50 | 75 |
7 | 100 | 30 | 72 | 75 |
8 | 87.5 | 30 | 74 | 75 |
9 | 90 | 20 | 57 | 40 |
10 | 100 | 20 | 68 | 57 |
a Percentage true-positives.
b Percentage false-positives.
c Percentage overlap time.
d Percentage early alarms.
US Department of Health and Human Services region 4. The x-axis plots the Google Flu Trends and Centers for Disease Control and Prevention (CDC) data. The horizontal bars indicate where each method detected an epidemic. Cut indicates the cut-off point (more is less sensitive) and b indicates baseline data (training window). The thick horizontal bars at the bottom show the actual outbreak. HCusum = historical cumulative sum, HLM = historical limits method, HNBC = historical negative binomial cumulative sum, ILI = influenza-like illnesses, k = reference value for threshold, NBC = negative binomial cumulative sum, POD = Poisson outbreak detection, PSC = Poisson cumulative sum.
US Department of Health and Human Services region 6. The x-axis plots the Google Flu Trends and Centers for Disease Control and Prevention (CDC) data. The horizontal bars indicate where each method detected an epidemic. Cut indicates cut-off point (more is less sensitive) and b indicates baseline data (training window). The thick horizontal bar at the bottom shows the actual outbreak. HCusum = historical cumulative sum, HLM = historical limits method, HNBC = historical negative binomial cumulative sum, ILI = influenza-like illnesses, k = reference value for threshold, NBC = negative binomial cumulative sum, POD = Poisson outbreak detection, PSC = Poisson cumulative sum.
US Department of Health and Human Services region 10. The x-axis plots the Google Flu Trends and Centers for Disease Control and Prevention (CDC) data. The horizontal bars indicate where each method detected an epidemic. Cut indicates cut-off point (more is less sensitive) and b indicates baseline data (training window). The thick horizontal bar at the bottom shows actual outbreak. HCusum = historical cumulative sum, HLM = historical limits method, HNBC = historical negative binomial cumulative sum, ILI = influenza-like illnesses, k = reference value for threshold, NBC = negative binomial cumulative sum, POD = Poisson outbreak detection, PSC = Poisson cumulative sum.
US Centers for Disease Control and Prevention data with the variance to mean ratio (VMR) line above, along the VMR = 1 mark.
In this study, we augmented the capabilities of Google Flu Trends by evaluating various algorithms to translate the raw search query volume produced by this service into actionable alerts. We focused, in particular, on leveraging the ability of Google Flu Trends to provide a near real-time alternative to conventional disease surveillance networks and to explore the practicality of building an early epidemic detection system using these data. This paper presents the first detailed comparative analysis of popular early epidemic detection algorithms on Google Flu Trends. We explored the relative merits of these methods and considered the effects of changing Internet prevalence and population sizes on the ability of these methods to predict epidemics. In these evaluations, we drew upon data collected by the CDC and assessed the ability of each algorithm within a consistent experimental framework to predict changes in measured CDC case frequencies from the Internet search query volume.
Our analysis showed that adding a layer of computational intelligence to Google Flu Trends data provides the opportunity for a reliable early epidemic detection system that can predict disease outbreaks with high accuracy in advance of the existing systems used by the CDC. However, we note that realizing this opportunity requires moving beyond the CUSUM- and HLM-based normal distribution approaches traditionally employed by the CDC. In particular, while we did not find a single best method to apply to Google Flu Trends data, the results of our study strongly support negative binomial- and Poisson-based algorithms being more useful when dealing with potentially noisy search query data from regions with varying Internet penetrations. For such data, we found that normal distribution algorithms did not perform as well as the negative binomial and Poisson distribution algorithms.
Furthermore, our analysis showed that the patient data of a disease follows different distributions throughout the year. Therefore, when VMR of data is equal to 1, it is ideally following a Poisson distribution and could be handled by a Poisson-based algorithm. As the increase in variance raises VMR above 1, the data become overdispersed. Poisson-based algorithms can handle this overdispersion, up to a limit [
Our research is the first attempt of its kind to relate epidemic prediction using Google Flu Trends data to Internet penetration and the size of the population being assessed. We believe that understanding how these factors affect algorithms to predict epidemics is an integral question for scaling a search query-based system to a broad range of geographical regions and communities. In our investigations, we observed that both Internet penetration and population size had a definite impact on algorithm performance. SaTScan performs better when applied to data from regions with high Internet penetration and small population size, while POD and NBC achieves better results when Internet penetration is low and population size is large. CUSUM performs best in regions with a large population. While the availability of search query data and measured (ie, CDC) case records restrict our analyses to the United States, we believe many of these insights may be useful in developing an early epidemic prediction system for other regions, including communities in the developing world.
In conclusion, we present an early investigation of algorithms to translate data from services such as Google Flu Trends into a fully automated system for generating alerts when the likelihood of epidemics is quite high. Our research augments the ability to detect disease outbreaks at early stages, when many of the conditions that impose an immense burden globally can be treated with better outcomes and in a more cost-effective manner. In addition, the ability to respond early to imminent conditions allows for more proactive restriction of the size of any potential outbreak. Together, the findings of our study provide a means to convert raw data collected over the Internet into more fine-grained information that can guide effective policy in countering the spread of diseases.
Based on our work, we have developed FluBreaks (dritte.org/flubreaks), an early warning system for flu epidemics using Google Flu Trends.
Ranking of algorithms in different parameters of evaluation for HSS Region 4 (Highest Population).
Ranking of algorithms in different parameters of evaluation for HSS Region 6 (Lowest Percent Internet Use).
Ranking of algorithms in different parameters of evaluation for HSS Region 10 (Lowest Population and Highest Percent Internet Use).
Centers for Disease Control and Prevention
cumulative sum
percentage early alarms
Early Aberration Reporting System
historical cumulative sum
historical limits method
Outpatient Influenza-like Illness Surveillance Network
negative binomial cumulative sum
percentage overlap
Poisson outbreak detection
Poisson cumulative sum
percentages false-positives
percentage true-positives
variance to mean ratio
We acknowledge Dr Zeeshan Syed of the University of Michigan for his valuable feedback and intellectual contribution. We thank Dr Farkanda Kokab, Professor of Epidemiology, Institute of Public Health, Pakistan, and Dr Ijaz Shah, Professor and Head of Department of Community Medicine, Quaid-e-Azam Medical College, Pakistan, for marking our outbreaks and providing us with valuable feedback. We also thank Dr Lakshminarayanan Subramanian of New York University for reviewing our paper.
None declared.