This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
The popularity of dengue can be inferred from Google Trends that summarizes Google searches of related topics. Both the disease and its Google Trends have a similar source of causation in the dengue virus, leading us to hypothesize that dengue incidence and Google Trends results have a longrun equilibrium.
This research aimed to investigate the properties of this longrun equilibrium in the hope of using the information derived from Google Trends for the early detection of upcoming dengue outbreaks.
This research used the cointegration method to assess a longrun equilibrium between dengue incidence and Google Trends results. The longrun equilibrium was characterized by their linear combination that generated a stationary process. The DickeyFuller test was adopted to check the stationarity of the processes. An error correction model (ECM) was then adopted to measure deviations from the longrun equilibrium to examine the shortterm and longterm effects. The resulting models were used to determine the Granger causality between the two processes. Additional information about the two processes was obtained by examining the impulse response function and variance decomposition.
The DickeyFuller test supported an implicit null hypothesis that the dengue incidence and Google Trends results are nonstationary processes (
Various hypothesis testing results in this research concluded that Google Trends results can be used as an initial indicator of upcoming dengue outbreaks.
Dengue is known as an infectious disease, which is caused by the dengue virus from
Bandung is one of the crowded cities in Indonesia. It has the highest dengue incidence, especially in West Java. Daily habits, landscape structures, weather, and the ecosystem in the city play roles in dengue vector breeding as primary factors for dengue transmission. The climate in Bandung is a mountainous climate (humid and cold), with an average temperature of 23.5°C. The average rainfall is 200.4 mm, and there are on average 21.3 rainy days per month. It is an ideal environment for
In this modern world, it is impossible to say that technology, especially the internet, does not influence human lives. Over the years, research has been performed to investigate the accuracy of using internet search engine data to predict reallife phenomena, such as influenza epidemics and flu trends [
According to StatCounter, in 2016, Google was the most used text search engine in Indonesia. About 97% of people who use the internet in Indonesia use Google. It is assumed that Indonesian people show the trends to find information about dengue on the internet [
We started our research with an initial hypothesis that the popularity of dengue on Google correlates with dengue cases in Bandung. We then investigated the relationship between these two data by using the DickyFuller test, error correction model (ECM), impulse response function, and variance decomposition. We hoped that information from Google Trends can be used for the early detection of upcoming dengue outbreaks so that policymakers can prepare for the early prevention or control of the epidemic.
Google Trends is a website that analyzes the popularity of a topic in various countries and various languages based on search requests. The data source is over the internet and open source and can be easily accessed by everyone. In Google Trends, a user can enter a keyword in the form of words or phrases related to the selected topic or cases. Google Trends is not case sensitive but takes into account spelling errors that might occur. Users can specify the duration of time they want to review by selecting a time range or specifying a date. In addition, users can specify the area to be reviewed by selecting the appropriate country, city, or province or state. They can also see the popularity of these keywords globally by selecting the option
Data used in this study are timeseries data of dengue incidence from Santo Borromeus Hospital in Bandung, as well as popularity data taken from Google Trends via the website (
Dengue data plot from Google Trends and reported cases in Bandung.
We performed a stationary test for the time series data of Google Trends (
Differencing a series produces another set of observations, such as the first differenced values, where △
For cointegration, Engle and Granger [
After finding the Google Trends and dengue incidence series to be firstorder difference stationary, the longrun equilibrium relationship can be stated in the following form:
where
Let {
After a cointegrating relationship has been established, an ECM can be built to establish the shortrun relationship between two variables. A likelihood ratio test can be used to determine the time lag of the vector ECM or the value of
Analysis of cointegration shows that Google Trends and dengue incidence have a longrun equilibrium relationship. However, they are in disequilibrium in the short term. View equations 2 and 3 as a vector autoregression (VAR) model as follows:
Hence, the vector ECM at hand can be written as a VAR model as follows:
Before estimating the vector ECM, the optimal lag order is first determined.
One way to test causality is to see whether the time lag of one variable is relevant for another variable. In a twoequation system with stationary variables
In order to test Granger causality, a standard
In a cointegrated system,
To analyze the dynamic effects of the model in response to shocks and the effects on the two variables, the impulse response function and variance decomposition were examined.
The stationary test results can be seen in
DickeyFuller test for Google Trends data, dengue incidence data, first differenced Google Trends data, and first differenced dengue incidence data.
Variable  DickeyFuller test statistic (value)  DickeyFuller critical value (N=250) 

−2.42 (.02)  −2.58 

−2.24 (.03)  −2.58 

−21.76 (.01)  −2.58 

−27.85 (.01)  −2.58 
^{a}Google Trends data.
^{b}Dengue incidence data.
^{c}First differenced Google Trends data.
^{d}First differenced dengue incidence data.
(A)
The cointegration test results of ordinary least squares regression yielded that the longrun equilibrium relationship can be shown as follows:
with
Let {
From previous results, it was seen that {
DickeyFuller test for the residual sequence.
Variable  DickeyFuller test statistic  DickeyFuller critical value (N=250) 

−8.77  −2.58 
^{a}residual estimated as follows:
The longest feasible lag length was set as 8 weeks. Thereafter, the value of the determinant of the variancecovariance matrix of a model with lag length eight was examined (denoted as Σ_{8}) and compared with that of a model with lag length seven (denoted as Σ_{7}). The likelihood ratio is (
The results of this test are shown in
Likelihood ratio test for lag length.
Number 


Likelihood ratio 

Verdict 
1  7.655  13.277  
2  3.291  13.277  
3  0.221  13.277  
4  2.543  13.277  
5  6.191  13.277  
6  19.666  13.277  
7  27.887  13.277  
8  60.361  13.277 
After finding the optimal number of lags, an ECM model was built. The estimated vector ECM is as follows:
From the equation, it is seen that the speed of the adjustment parameter is −0.1816 for {
The speed of adjustment parameter for dengue incidence was nine times larger than the value for Google Trends, meaning that dengue incidence is more responsive to deviations from the longrun equilibrium. On the other hand, Google Trends only responds slightly to the aforementioned deviation.
It was found that this model has an Rsquared value of 0.4128 for the Δ
Based on the vector ECM in equation 7, Granger causality was tested between Google Trends and dengue incidence. It was noted that at lag 2 and 3,
The results for 12 periods (3 months) are obtained as presented below.
As shown in
On the other hand, through analysis of the response of dengue incidence to a positive shock, it was found that dengue popularity increases slightly and then remains constant. This behavior is presented in
Generally, the impulse response function shows that Google Trends has a relevant impact on dengue fever incidence and has a longterm effect. On the contrary, dengue incidence has only a shortterm and small effect on the popularity of dengue on Google.
Variance decomposition estimates the contribution of shocks in a variable toward the response of another variable. As shown in
On the other hand,
In summary, it can be seen that Google Trends influences dengue incidence in the long term, but dengue incidence only influences Google Trends in the short term and not in the long term. As presented in the model, dengue incidence is related to not only the popularity of dengue in Google but also its lagged value of up to 1 week.
Impulse response function of (A) dengue data from Google Trends with respect to reported cases and (B) reported cases with respect to dengue data from Google Trends. Forecast error variance decomposition of (C) dengue data from Google Trends and (D) reported cases.
Our results show that there is indeed a causal relationship between dengue popularity in Google Trends and dengue incidence in Bandung. A Granger cointegrated relationship between dengue popularity in Google Trends and dengue incidence in Bandung was noted. This is justified because both data sets were found to be
Based on the ECM, it can be seen that there is a relationship between Google Trends results and dengue incidence. Through Granger analysis, it was seen that Google Trends Grangercauses dengue incidence in Bandung at a lag of 2 and 3 weeks. This was further supported by the impulse response function, where shocks in dengue popularity in Google cause dengue incidence to increase. It was also supported by the variance decomposition, where after 1 week, the contribution from Google Trends to dengue incidence variance increases. Granger analysis also showed that dengue incidence does not Grangercause its popularity in Google.
The vector ECM also showed that dengue incidence is more responsive to deviations from the longrun equilibrium, since it has a larger value of the speed of adjustment, which is nine times the value for Google Trends.
The results showed a causal relationship between dengue popularity in Google Trends and dengue incidence in Bandung. However, this exact ECM cannot be used for forecasting or early detection owing to the low Rsquared values of 0.4128 for the Google Trends equation and 0.1511 for the dengue incidence equation. A further improved model will need to be built for future forecasting.
The results of this study can help provide a more realtime indication of dengue outbreaks in Bandung. Owing to Indonesia’s standard and traditional approach to dengue surveillance, the data of dengue cases have several weaknesses, such as low accuracy and timeliness [
Our proposed model used strong assumptions, such as the behavior of the use of gadgets and social media in the community, which is quite high, and a good internet signal in the observation area (Bandung in this case). Therefore, it is risky to implement the findings in areas with low internet access.
Google Trends data may be used as an initial indicator of a dengue outbreak in Bandung. However, further improvements to the ECM need to be made by using more data points to gain more extensive insights.
error correction model
vector autoregression
Part of this work was supported by the Indonesian RistekDikti Grant 2020. The second author gratefully acknowledges the financial support provided by the Indonesia Ministry of Research and Technology through the Pendidikan Magister menuju Doktor untuk Sarjana Unggul (PMDSU) Program.
MS performed the statistical analysis, review, and interpretation. MF and JTMSE performed numerical simulation and constructed this paper. ES interpreted and reviewed the manuscript.
None declared.