Subregional Nowcasts of Seasonal Influenza Using Search Trends

Background Limiting the adverse effects of seasonal influenza outbreaks at state or city level requires close monitoring of localized outbreaks and reliable forecasts of their progression. Whereas forecasting models for influenza or influenza-like illness (ILI) are becoming increasingly available, their applicability to localized outbreaks is limited by the nonavailability of real-time observations of the current outbreak state at local scales. Surveillance data collected by various health departments are widely accepted as the reference standard for estimating the state of outbreaks, and in the absence of surveillance data, nowcast proxies built using Web-based activities such as search engine queries, tweets, and access of health-related webpages can be useful. Nowcast estimates of state and municipal ILI were previously published by Google Flu Trends (GFT); however, validations of these estimates were seldom reported. Objective The aim of this study was to develop and validate models to nowcast ILI at subregional geographic scales. Methods We built nowcast models based on autoregressive (autoregressive integrated moving average; ARIMA) and supervised regression methods (Random forests) at the US state level using regional weighted ILI and Web-based search activity derived from Google's Extended Trends application programming interface. We validated the performance of these methods using actual surveillance data for the 50 states across six seasons. We also built state-level nowcast models using state-level estimates of ILI and compared the accuracy of these estimates with the estimates of the regional models extrapolated to the state level and with the nowcast estimates published by GFT. Results Models built using regional ILI extrapolated to state level had a median correlation of 0.84 (interquartile range: 0.74-0.91) and a median root mean square error (RMSE) of 1.01 (IQR: 0.74-1.50), with noticeable variability across seasons and by state population size. Model forms that hypothesize the availability of timely state-level surveillance data show significantly lower errors of 0.83 (0.55-0.23). Compared with GFT, the latter model forms have lower errors but also lower correlation. Conclusions These results suggest that the proposed methods may be an alternative to the discontinued GFT and that further improvements in the quality of subregional nowcasts may require increased access to more finely resolved surveillance data.


Model Formulation
Let 1: denote the logit transformed ILI observations for region r through week w; 1: , = [qf(t, r, 1), qf (t, r, 2), …, qf(t, r, v)] the vector of logit transformed query fractions for term t at HHS region r through week v; and, Q the feature set of terms identified as explanatory variables. The predictor matrix with query fractions for all terms in Q is thus: Due to the lag in ILI release, ≥ + 1. We fit an ARIMA model using observations through weeks w and forecast ahead through week v: The ARIMA result is added as an additional explanatory variable to the predictor matrix, yielding ̃1 : = [ 1: ̃1: ].
Using ̃1 : as the predictor matrix and 1: as the vector of responses, we train a random forest model, ̂ , for region r at week w: For a state s in region r with a query fraction matrix

Alternate model forms -state ILI as response
Let 1: and 1: be defined analogous to 1: and 1: respectively. We fit an ARIMA model using state-level ILI.

Sensitivity Analysis of the choice of floor
For the analysis reported in the manuscript, query fractions that were zeros were replaced by a very small value (1E-12) before the logit transformation was applied. We performed a sensitivity analysis on the choice of floor by testing two alternative floor values -1E-8 and 1E-10. Figure S2 shows that COR/RMSE/MAPE are virtually unchanged for all model variants during the 6 seasons for any of the three values of floor. Additionally, we performed a Friedman-Nemenyi test and established that these differences are not statistically significant.

Analysis of the effect of inheritance on nowcast quality
For the three model variants that inherit regional query fractions (RRS, SRS and SSS), we re-estimated nowcasts without using inheritance i.e. the state query fractions were unaltered. We compared the quality of these estimates with estimates that result from the use of inheritance. In Figure S3, we see that inheritance improves correlation overall and particularly in low population states, but has no significant impact on root mean squared error and increases mean absolute proportion error. We performed paired Wilcoxon tests to test for significance and found that the differences in correlation and MAPE are significant but not with RMSE.

Response ARIMA trained on
Query fractions RR0 ILI -Regional ILI -Regional RRR ILI -Regional ILI -Regional GET -Regional RRS ILI -Regional ILI -Regional GET -State

SRR
ILI -State ILI -Regional GET -Regional

SRS
ILI -State ILI -Regional GET -State

ILI -State ILI -State GET -State
We performed paired Wilcoxon signed-rank test (1, 2) to check if there is a statistically significant difference in the mean of the measures (COR, RMSE and MAPE) between pairs of model forms, to complement Friedman/Nemenyi tests reported in the manuscript.   Terms afrin, baby cough, benzonatate, body temperature, bronchitis, child fever, cold and flu, cold or flu, cold remedies, cold symptoms, cold vs flu, common cold, cough and cold, cough fever, cough medicine, coughing, coughing up, cure flu, cure for flu, cure the flu, delsym, dry cough, feed a cold, fever flu, fever in children, fever temperature, flu and cold, flu care, flu children, flu fever, flu how long, flu how long contagious, flu in adults, flu incubation, flu kids, flu or cold, flu pneumonia, flu remedies, flu report, flu sore throat, flu stomach, flu symptoms children, flu symptoms fever, flu symptoms in children, flu temperature, flu type a, flu virus, flu vs cold, how long does the flu last, how long flu, human temperature, incubation period, influenza, influenza a, influenza b, influenza symptoms, is bronchitis, low body temperature, low temperature, nasal congestion, nyquil, oscillococcinum, pneumonia contagious, pneumonia symptoms, pneumonia treatment, robitussin, robitussin dm, sinus infection, sinusitis, starve a cold, stomach flu, stop coughing, strep throat, strep throat symptoms, symptoms of flu, symptoms of sinus infection, tamiflu side effects, temperature fever, the flu, the flu virus, the flue, toddler cough, treat flu, tussionex, tylenol cold, type a flu, upper respiratory, upper respiratory infection, viral flu, walking pneumonia