This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Screening for influenza in primary care is challenging due to the low sensitivity of rapid antigen tests and the lack of proper screening tests.
The aim of this study was to develop a machine learning–based screening tool using patient-generated health data (PGHD) obtained from a mobile health (mHealth) app.
We trained a deep learning model based on a gated recurrent unit to screen influenza using PGHD, including each patient’s fever pattern and drug administration records. We used meteorological data and app-based surveillance of the weekly number of patients with influenza. We defined a single episode as the set of consecutive days, including the day the user was diagnosed with influenza or another disease. Any record a user entered 24 hours after his or her last record was considered to be the start of a new episode. Each episode contained data on the user’s age, gender, weight, and at least one body temperature record. The total number of episodes was 6657. Of these, there were 3326 episodes within which influenza was diagnosed. We divided these episodes into 80% training sets (2664/3330) and 20% test sets (666/3330). A 5-fold cross-validation was used on the training set.
We achieved reliable performance with an accuracy of 82%, a sensitivity of 84%, and a specificity of 80% in the test set. After the effect of each input variable was evaluated, app-based surveillance was observed to be the most influential variable. The correlation between the duration of input data and performance was not statistically significant (
These findings suggest that PGHD from an mHealth app could be a complementary tool for influenza screening. In addition, PGHD, along with traditional clinical data, could be used to improve health conditions.
With the increasing popularity of mobile health (mHealth), a considerable amount of health-related data are now generated and accumulated outside of hospitals [
Many studies have shown that PGHD have various potential benefits for health care. For example, PGHD may help patients with chronic diseases like diabetes or hypertension take better care of themselves by delivering continuous monitoring and support with more personalized treatment planning [
Although influenza outbreaks can be predicted using PGHD, the diagnosis or screening of individual patients has been conducted using traditional medical devices, such as the rapid influenza antigen test or reverse transcription–polymerase chain reaction (RT-PCR). The rapid influenza diagnostic test (RIDT) has mainly been used as a diagnostic test because of its reduced processing time and easy accessibility [
Fever is regarded as the most distinctive symptom of influenza. Due to the lack of other distinguishable symptoms, it can be challenging to differentiate influenza from other diseases [
We retrospectively collected log data from the Fever Coach app, which is available on Android and iOS [
The data were collected from January 2017 to December 2018. A total of 480,793 users entered 28,010,112 records. During the same period, the number of users diagnosed with influenza at a clinic was 16,432. In 2017 and 2018, 3583 and 12,849 users were diagnosed with influenza, respectively. The log data included body temperature, volume, type and form of antipyretic drugs or antibiotic drugs, sex, age, weight, symptoms, and memos. The users of Fever Coach agreed that their deidentified data could be used for research purposes, and the institutional review board of Samsung Medical Center waived informed consent.
We collected the daily mean temperature, daily maximum temperature, daily minimum temperature, daily mean dew point, daily mean relative humidity, and daily mean pressure data between January 2017 and December 2018 from the Korea Meteorological Administration information portal. The observation point was Seoul 108 [
Korea Center for Disease Control (KCDC) produces a weekly influenza-like illness report every Tuesday using data received from public health centers during the previous week. These data were collected for the period of January 2017-December 2018 [
Screenshots of the Fever Coach app.
All of the log data, separated by user ID and year, were then split into episodes. The episodes were defined as the set of consecutive days containing the day the user was diagnosed with influenza or another disease. For example, if a user was diagnosed with influenza on February 23, 2018, and recorded his or her body temperature between February 21, 2018, and February 24, 2018, these days were considered to be 1 episode. If the user logged another record 24 hours after his or her previous record, it was considered to be a new episode.
Each episode must contain information about the user’s age, gender, and weight. Users were divided into 4 age groups—0-2 years, 2-5 years, 6-12 years, and ≥13 years—to avoid possible overfitting according to age, as age is one of the key factors of influenza propagation. Any episode without age, gender, and weight was excluded. Moreover, any episode not containing at least 1 fever data point was excluded.
We then calculated the app-based weekly influenza surveillance from the influenza-diagnosed episodes each year. The app-based weekly influenza surveillance was defined by the weekly number of reported influenza cases divided by the total number of annually reported influenza cases in the same year. For example, if there were 3000 reported influenza cases in 2018 and 300 weekly reported influenza cases in week 49 of 2018, the app-based surveillance for week 49 of 2018 was 0.1. We calculated this value every week for each year and then added this value to the corresponding episode. If each episode had multiple days, we used the first day of each episode as the representative value, considering that the incubation period of influenza is 1 to 4 days [
We also added meteorological data from the Korean Meteorological Administration. As before, we used values corresponding to the first day of each episode. We added KCDC laboratory surveillance as well, but this time we used values corresponding to 1 week before the first day of each episode. Due to the reporting delay of the KCDC surveillance, we could not use values corresponding to the same week.
Finally, as the log data we collected had more noninfluenza episodes than the influenza episodes, we set the number of the noninfluenza episodes to be the same as the influenza episodes each year. Data from 2018 were used for training and hyperparameter tuning, and those data were randomly split into the training set (2664/3330, 80%) and the test set (666/3330, 20%). A 5-fold cross-validation was used on the training set. Considering that the influenza epidemic is slightly different each year, we prepared an additional validation set. Although our training/test sets included the data collected in 2018, the additional validation set included the data collected in 2017 that had a different distribution of weekly reported influenza cases. As with the training/test set, the additional validation set was also adjusted to 50:50 for influenza and noninfluenza episodes.
Examples of episode separation.
Episodes and the user-added date and time log | Time elapsed since the previous log | ||
|
|||
|
2018-09-06 22:25 | N/Aa | |
|
2018-09-06 22:37 | 0 h 12 min | |
|
2018-09-06 23:53 | 0 h 16 min | |
|
2018-09-07 1:01 | 0 h 8 min | |
|
2018-09-07 2:49 | 1 h 48 min | |
|
2018-09-07 10:00 | 7 h 11 min | |
|
2018-09-07 15:56 | 5 h 56 min | |
|
2018-09-07 21:15 | 5 h 19 min | |
|
2018-09-08 11:20 | 14 h 5 min | |
|
2018-09-08 12:10 | 0 h 50 min | |
|
2018-09-08 21:10 | 9 h 0 min | |
|
2018-09-09 12:14 | 15 h 4 min | |
|
2018-09-09 21:38 | 9 h 24 min | |
|
2018-09-10 9:40 | 12 h 2 min | |
|
2018-09-10 21:30 | 11 h 50 min | |
|
2018-09-11 9:14 | 11 h 44 min | |
|
2018-09-11 19:14 | 10 h 0 min | |
|
|||
|
2018-10-03 22:11 | > 24 h | |
|
2018-10-03 22:12 | 0 h 1 min | |
|
2018-10-03 22:26 | 0 h 14 min | |
|
2018-10-03 23:31 | 1 h 5 min | |
|
2018-10-04 0:31 | 1 h 0 min | |
|
2018-10-04 2:38 | 2 h 7 min | |
|
|||
|
2018-10-11 8:30 | > 24 h | |
|
2018-10-11 10:10 | 1 h 40 min | |
|
2018-10-11 10:12 | 0 h 2 min | |
|
2018-10-11 10:14 | 0 h 2 min | |
|
2018-10-11 11:35 | 1 h 21 min |
aN/A: not applicable.
Pipeline for data preprocessing. KCDC: Korea Center for Disease Control.
We used GRU-D as our baseline model [
The total number of episodes obtained was 6657. Out of these 6657 episodes, 3326 were diagnosed with influenza. The average and SD of each episode length were 29.24 (SD 21.79).
General characteristics of the data set.
Variables | Year 2017 | Year 2018 | |
|
|||
|
Average number of inputs | 15.05 | 20 |
|
Variance in the number of inputs | 16.32 | 18.29 |
|
|||
|
Average number of inputs | 4.578 | 6.040 |
|
Variance in the number of inputs | 4.685 | 24.03 |
|
|||
|
At least 1 antibiotic administration | 372 | 4705 |
|
No antibiotic administration | 2118 | 1952 |
|
|||
|
0 to 2 | 886 | 2529 |
|
2 to 5 | 1328 | 3564 |
|
5 to 12 | 262 | 479 |
|
Older than 12 | 14 | 85 |
|
|||
|
Male | 1246 | 3348 |
|
Female | 1244 | 3309 |
Based on the GRU-D, the proposed screening algorithm used PGHD (body temperature records, antipyretic drug administration records, and antibiotic drug administration records), app-based surveillance, and meteorological data as the input variables. The area under the receiver operating characteristic (AUROC) curve of the test data set was 0.902, with an accuracy of 82.43% (95% CI 80.28%-84.44%), a sensitivity of 84.20% (95% CI 81.07%-87.00%), a specificity of 80.92% (95% CI 77.85%-83.73%), a positive predictive value (PPV) of 79.05% (95% CI 76.38%-81.50%), and a negative predictive value (NPV) of 85.69% (95% CI 83.26%-87.83%). The confusion matrix and the receiver operating characteristic (ROC) curve are shown in
Confusion matrix for the test set and the additional validation set.
Receiver operating characteristic (ROC) curve illustrating the screening ability of the model. The red line shows a random guess, the blue line is the result of the test set collected in 2018, and the orange line is the result of additional validation using data from 2017. AUROC curve: area under the receiver operating characteristic curve.
Considering that the influenza epidemic is slightly different each year, we prepared additional validation set as described in the methods section. For the additional validation set, we achieved an area under the curve (AUC) of 0.8647, an accuracy of 77.99% (95% CI 76.31%-79.61%), a sensitivity of 82.35% (95% CI 79.91%-84.61%), a specificity of 74.79% (95% CI 72.46%-77.02%), a PPV of 70.57% (95% CI 68.59%-72.47%), and an NPV of 85.24% (95% CI 83.47%-86.84%).
We also attempted to evaluate the effect of the input variables on performance in 2 ways. First, we removed them one at a time from all variables. Second, we added them one at a time from baseline variables. To remove them one by one, we first trained the model using all 10 input variables and measured the performance at that time. We then removed 1 input variable and trained the model on the same data set using a total of 9 input variables and measured the performance. We obtained a total of 10 results and summarized them in
The effects of the removal of each variable from the analysis. “–<Variable>” means that the variable was singularly removed from the list of variables for the corresponding experiment.
Variable | Sensitivity | Specificity | AUROCa | Accuracy | NPVb | F1 |
All | 0.8171 | 0.8425 | 0.8931 | 0.8296 | 0.8163 | 0.8300 |
–Sex | 0.8510 | 0.8028 | 0.8960 | 0.8273 | 0.8387 | 0.8338 |
–Weight | 0.8171 | 0.8150 | 0.8832 | 0.8161 | 0.8113 | 0.8189 |
–Age | 0.8333 | 0.8346 | 0.8911 | 0.8339 | 0.8333 | 0.8339 |
–Fever | 0.8083 | 0.8287 | 0.8882 | 0.8183 | 0.8065 | 0.8191 |
–Antipyretics | 0.8510 | 0.8058 | 0.8744c | 0.8288 | 0.8392 | 0.8350 |
–Anti-viral agent | 0.8304 | 0.8211 | 0.8892 | 0.8258 | 0.8236 | 0.8292 |
–App-based surveillance | 0.8215 | 0.7905 | 0.8775 | 0.8063c | 0.8103 | 0.8120c |
–KCDCd surveillance | 0.8614 | 0.7813c | 0.8892 | 0.8221 | 0.8446 | 0.8313 |
–Meteorological | 0.7950c | 0.8486 | 0.8900 | 0.8213 | 0.7997c | 0.8191 |
aAUROC: area under the receiver operating characteristic.
bNPV: negative predictive value.
cThe highest decrease in the value for the corresponding column.
dKCDC: Korea Center for Disease Control.
Another experiment was conducted to observe the performance changes by defining the base features and adding the variables one at a time (
Effect of each variable on the analysis. The baseline included body temperature, antipyretic drug, and antibiotic drug data. “+<variable>” means that the variable was added to the baseline for the analysis and then removed for the next analysis (noncumulative addition).
Variable | Sensitivity | Specificity | AUROCa | Accuracy | NPVb | F1 |
Baseline | 0.6018 | 0.7187 | 0.7221 | 0.6592 | 0.6351 | 0.6425 |
+sex | 0.5678 | 0.7401 | 0.7087 | 0.6524 | 0.6229 | 0.6245 |
+weight | 0.5734 | 0.7523 | 0.7232 | 0.6619 | 0.6332 | 0.6315 |
+age | 0.5634 | 0.7477 | 0.7201 | 0.6539 | 0.6229 | 0.6237 |
+app surveillance | 0.8673c | 0.7599 | 0.8808c | 0.8146c | 0.8467c | 0.8264c |
+KCDCd surveillance | 0.7670 | 0.7936c | 0.8607 | 0.7800 | 0.7666 | 0.7802 |
+meteorological | 0.8127 | 0.7470 | 0.8712 | 0.7802 | 0.7961 | 0.7888 |
aAUROC: area under the receiver operating characteristic.
bNPV: negative predictive value.
cThe highest increase in the value for the corresponding column.
dKCDC: Korea Center for Disease Control.
Finally, we looked at the correlation between the duration of the input data and the screening performance.
Screening performance versus the number of body temperature records. The y-axis shows the percentage of accuracy, and the x-axis refers to the number of body temperatures entered by the user.
With this study, we investigated the possibility of screening for influenza using PGHD, such as body temperature and medication records collected from an mHealth app.
At the beginning of this study, we did not know whether body temperature would change when antipyretics were administered, or if body temperature alone was more important. Although fever is a major symptom of influenza, it is impossible to diagnose influenza using only body temperature changes [
Body temperature is known to be one of the most important symptoms of influenza. However, its effect on the model was not as strong as we expected. A temperature higher than 38.3 ºC was recorded at least once during 97.42% (6485/6657) episodes in our data. This shows that the majority of users used the app when their children had a fever, which was the original purpose of the app. Among the episodes, 50.82% (3296/6485) were those of influenza, and 49.18% (3189/6485) were due to other conditions. The mean and variance of body temperature in the patient group diagnosed with influenza were 38.1519 ºC and 0.8611 ºC, respectively; and the mean and variance of body temperature with other conditions were 38.0449º C and 0.8367 ºC. There was a significant difference between the 2 groups (
One interesting finding was the effect that sex had on specificity. Although some studies have shown that there is a difference in influenza prevalence by gender, our data found that the sex ratio was almost equal, with 1677 males and 1660 females diagnosed with influenza. Moreover, when we excluded sex from the input variables, the accuracy and F1 measure did not significantly change. We obtained similar results by repeating the ablation study. Therefore, further research may be needed to clarify this point.
In summary, age, weight, and gender had little effect on the screening performance. App-based surveillance has greatly improved the screening performance and is nearly identical to using KCDC laboratory surveillance or meteorological data, which are frequently used as indicators of influenza outbreaks.
This study has several limitations. First, the training and validation data used were self-reported by the patients. Most users reported their diagnosis using their smartphones; thus, these data were not reported by clinicians. Therefore, we cannot ascertain that the same results would be recorded if hospital-generated data were used. Also, primary care doctors usually use the RIDT instead of RT-PCR to diagnose influenza. As the RIDT has low reliability, our ground truth label may be noisy. For the deep learning model, if the character of the data on deployment is slightly different from that of the training data, it is difficult to achieve the expected performance on validation due to the difficulties in analyzing the effect of the data distribution and input variables on the model [
Screening for influenza can be challenging due to the low sensitivity of rapid antigen tests and the lack of proper screening tests. In this study, we developed a deep learning–based screening tool using PGHD obtained from an mHealth app. The experimental results confirm that PGHD from an mHealth app can be a complementary tool for screening for influenza in individual patients. Since our digital approach can screen patients without physical contact, this approach could be quite beneficial in screening new contagious diseases.
App-based surveillance calculated from user input data.
area under the curve
area under the receiver operating characteristic
gated recurrent unit
influenza-like illness
Korea Center for Disease Control
Middle East respiratory syndrome
mobile health
negative predictive value
patient-generated health data
positive predictive value
rapid influenza diagnostic test
receiver operating characteristic
reverse transcription-polymerase chain reaction
HC implemented the code and performed the experiments. MK and JC manipulated the raw data for preprocessing and designed the experiments. JS provided the data and designed the experiments. SYS designed the experiments and supervised the study. All authors wrote the manuscript and discussed the results. HC and MK equally contributed to this work. JS and SYS are co-corresponding authors.
SYS holds stocks in Mobile Doctor, which created the app Fever Coach. SYS also holds stocks in Hurraypositive and Mune, serves as an outside director in Life Semantics, and is a partner of Digital Healthcare Partners. MK is Chief Medical Information Officer of Mobile Doctor and has equity in Mobile Doctor. SJW is the founding member and Chief Executive Officer of Mobile Doctor. SJW is also Chief Executive Officer of Aim Med and a partner in Digital Healthcare Partners. JC was an internship researcher for Mobile Doctor.