Using Smartphones and Wearable Devices to Monitor Behavioral Changes During COVID-19

Background: In the absence of a vaccine or effective treatment for COVID-19, countries have adopted nonpharmaceutical interventions (NPIs) such as social distancing and full lockdown. An objective and quantitative means of passively monitoring the impact and response of these interventions at a local level is needed


Introduction
On March 11, 2020, the World Health Organization declared the rapidly spreading SARS-CoV-2 virus outbreak a pandemic. This novel coronavirus is the cause of a contagious acute respiratory disease (COVID- 19), which was first reported in Wuhan, Hubei Province, China [1][2][3]. As of July 1, 2020, it had infected over 10 million people and spread to 213 countries and territories around the world [4]. Although precise statistics on mortality are being determined, COVID-19 can be deadly with an estimated 1% case fatality rate, and this rate increases for older adults and those with underlying health problems [5,6]. The outbreak of COVID-19 has placed an unprecedented burden on health care systems in most-affected countries and has resulted in considerable economic losses and a possible global recession [7,8].
To date, there is no vaccine or highly effective treatment. The widely adopted strategy has been the use of nonpharmaceutical interventions (NPIs) such as social distancing and even full lockdown to control the spread of the virus and ease the pressure on health and care systems [9,10]. NPIs have been implemented in many countries including China, Italy, Spain, the United Kingdom, and the Netherlands. These measures have been shown to considerably reduce the new confirmed cases [9]. Key to the success of NPIs is the timing of these interventions and the response of the population, both of which might differ among countries and could necessitate further interventions in the case of low compliance either nationally or locally. Furthermore, US $11 trillion of fiscal measures have been announced by more than two-thirds of governments worldwide in an attempt to mitigate the fallout from the pandemic and lockdown [11]. Therefore, we urgently require an objective and quantitative way to monitor population behavior to assess the impact and response of such interventions. Additionally, we need to monitor for the potential effects of a rebound in cases in the winter months as social distancing measures are relaxed and to strategize and understand where course corrections are required. Similarly, understanding potential seasonal forcing of COVID-19 will require a good understanding of different NPIs' effects, so they can be factored out.
The increasing availability of wide-bandwidth mobile networks, smartphones, and wearable sensors makes it possible to collect near real-time high-resolution data sets from large numbers of participants and greatly facilitates remote monitoring of behavior [12][13][14]. By leveraging sensor modalities in smartphones, which includes network and GPS location tracking, and Fitbit devices, which includes step counts and heart rate, it is possible to access mobility and even wellness for the population. To manage the data collected from multiple sensor modalities and mobile devices, platforms such as the open-source Remote Assessment of Disease and Relapse (RADAR)-base [15] mobile health platform have been developed [16]. This platform has been used to enable remote monitoring in a range of use cases including central nervous system diseases (major depressive disorder [MDD], epilepsy, and multiple sclerosis [MS]) as part of the Innovative Medicines Initiative (IMI2) RADAR-Central Nervous System (CNS) major program [17,18].
In this paper, we explore the utility of the RADAR-base platform as a toolbox to test the effect and response of NPIs aimed at limiting the spread of infectious diseases such as COVID-19 by leveraging participant data already collected from November 2017 onward as part of the ongoing RADAR-CNS studies [16,17,19]. Specifically, we created measures of mobility (as a proxy of physical distancing), phone use (as a proxy of virtual sociality), and physiological measures (heart rate and sleep), and compared these features among the baseline, prelockdown, and during lockdown periods. Furthermore, we also provide a joint analysis of these features to provide a holistic view and interpret these behavioral changes during COVID-19.

Data Collection
The RADAR-CNS studies were approved by all local ethics committees, and all participants signed informed consent [19]. We included 1062 participants recruited in five European countries: Italy, Spain, Denmark, the United Kingdom, and the Netherlands. Participants in the Netherlands were partially recruited through Hersenonderzoek.nl [20]. The data were collected for the purpose of finding new ways of monitoring MDD (Spain: n=150; the Netherlands: n=103; and the United Kingdom: n=316) and MS (Milan, Italy: n=208; Barcelona, Spain: n=179; and Copenhagen, Denmark: n=106) using wearable devices and smartphone technology to improve patients' quality of life (QoL) and potentially change the treatment of these and other chronic disorders. As we focused on country-level behavioral changes in response to the NPIs, we aggregated data collected in Spain and did not focus on analyzing differences between participants with MDD and MS. Passive participant data, that is data that did not require conscious participant engagement, were collected continuously on a 24/7 basis through a smartphone and a Fitbit device, which included location, Bluetooth, activity, sleep, heart rate, and phone use data. In this study, we used participants' own Android smartphones where available and provided a participant with a Motorola G5, G6, or G7 if participants had an iPhone or did not have a smartphone. For Fitbit devices, Fitbit Charge 2 devices were given to participants, and then Fitbit Charge 3 devices were given to the recently recruited participants when Fitbit Charge 2 devices were no longer available. We asked participants to wear the device on their nondominant hand.
Although not used for this study, active data were also collected, which required clinicians or participants to fill out emailed surveys (eg, Inventory of Depressive Symptomatology [Self-Report]), app-delivered questionnaires (eg, Patient Health Questionnaire), or perform short clinical tests (eg, Expanded Disability Status Scale).
The data collection and management were handled by the open-source mHealth platform RADAR-Base [16]. The platform provides high scalability, interoperability, flexibility, and reliability while allowing the freedom for anyone to deploy. Due to the streaming first nature of the platform, it is also easy to aggregate, analyze, and provide insights into the data in real time, hence making the results of this work potentially deployable for localized monitoring and targeted interventions.

Feature Extraction
To study physical-behavioral changes in response to COVID-19 NPIs, we examined participants' mobility by analyzing relative location and Bluetooth data from smartphones and step count data from Fitbit devices. We investigated phone unlock duration and social app use duration to study online social-behavioral changes. Physiological measures such as sleep and heart rate from Fitbit devices were also analyzed to identify possible changes as a result of lockdown. A full list of features is presented in Table 1. These features were extracted for each participant every day. The daily features were calculated using the data from 6 AM on the present day to 6 AM on the next day for all features except total sleep duration and bedtime, where 8 PM was used as the starting time point and 11 AM the finishing. When no data were found in a data modality for a participant on a day due to the participant not wearing the Fitbit device or not using the smartphone, we did not calculate the feature derived from that data modality on that day. The smartphone-derived location data were sampled once every 5 minutes by default, with longer sampling durations dependent on network connectivity. Spurious location coordinates were identified and removed if they differed from preceding and following coordinates by more than five degrees. Home location was determined daily by clustering location data between 8 PM and 4 AM with the mean coordinate of the cluster that the last coordinate belonged to being used. This choice was made because the largest cluster may not be the home location for a single night but the last location before phones shut down had a higher probability to be home location for that night. The clustering was implemented using density-based spatial clustering of applications with noise [21]. A duration gated by two adjacent coordinates was regarded as a valid homestay duration on the condition that both coordinates were no further than 200 meters from the home location. A duration longer than 1 hour was excluded due to the large proportion of missing data when compared to the 5-minute sampling duration. All valid homestay durations between 8 AM and 11 PM were summed to calculate daily homestay. Daily maximum distance from home was also computed based on the coordinates in the same period.
Bluetooth data, including the number of nearby and paired devices, were also collected from smartphones, which were sampled every hour. The daily maximum number of nearby devices was used as a mobility feature. An increased number of nearby devices (typically other phones) detected may indicate other users' presence in the vicinity, which therefore can serve as a proxy of physical distancing.
In addition to mobility features extracted from smartphones, daily step count was taken from the Fitbit device, which was computed as the total steps a participant walked every day. Likewise, daily sleep duration was computed as the summation of three Fitbit-output sleep categories (light, deep, and rapid eye movement) sampled every 30 seconds from 8 PM to 11 AM the next day. Bedtime was defined as the time of the first sleep category reported by Fitbit after 8 PM. Note that the sleep categories referred to the sleep stages provided by the Fitbit application programming interface [22], which are not equivalent to the medical sleep stages. Finally, daily mean heart rate was calculated by averaging the Fitbit-output heart rate readings, sampled every 5 seconds at best. This sampling interval may be longer depending on Fitbit proprietary algorithms for remaining battery level, quality scoring, and network connectivity.
To explore changes in phone use, daily unlock duration was calculated by summing time periods starting with the unlocked state and ending with the standby state. Single intervals longer than 4 hours were excluded, which might result from a missing standby state or unintentionally leaving the phone unlocked. App use was quantified by classifying apps according to categories listed on Google Play [23]. As we were particularly interested in cyber-social interactions, we focused on the daily use time of social apps, including the Google Play categories of Social, Communication, and Dating. Among them are Facebook, Instagram, and WhatsApp.

Data Analysis
We plotted how the features evolved over 1.5 years for each country investigated. The participants' daily median, 25th percentile, and 75th percentile of each feature were calculated and then plotted. A minimum of 20 participants' data points was a prerequisite for calculation for any given day to reduce variance and noise. To facilitate interpretation, we also marked time points of public announcements related to lockdown policies [24].
To examine changes in mobility, physiological measures, and phone use induced by the lockdowns, comparisons among baseline, prelockdown, and during lockdown on the daily median of each feature were carried out using Kruskal-Wallis tests followed by post hoc Dunn tests [25,26]. For the during lockdown phase, we chose the entire period of the national lockdown in each country, which ended when NPIs were eased for the first time. For the prelockdown phase, we chose the period immediately prior to the first restrictive measure with the same length of the entire national lockdown. For the baseline phase, we chose the same period in 2019 as the 2020 national lockdown for countries starting to collect data earlier than 2019, which included Italy, Spain, and the United Kingdom. This was aimed at suppressing seasonal variability. For Denmark and the Netherlands where participant recruitment and data collection started much later, we chose the period that started with the earliest stable date (no considerate missing data or outliers) with the same length of the entire national lockdown. If a significant difference among these three periods was found after Benjamini-Yekutieli correction for the number of features (n=9), post hoc Dunn test was applied with Benjamini-Yekutieli correction for the number of groups (n=3) [27]. Box plots were used to present the results. A P<.05 after Benjamini-Yekutieli corrections was deemed statistically significant. It should be noted that we applied corrections resulting from multiple comparisons and multiple features in each country.
We also studied factors that might influence the subpopulation behavioral features during the lockdown period. The investigated factors included age, gender, BMI, and educational background. For age groups, we defined the young group as younger than 45 years and the older adult group as 45 years or older. For BMI groups, the low BMI group was defined as less than 25, and the high BMI group as greater than or equal to 25. For education groups, we defined the degree group as having a bachelor's degree or above and the nondegree group as having lower qualifications. Furthermore, we defined a combined factor group of young men, as this subpopulation was suspected to be less compliant with social distancing measures. Here we focused on features of homestay and daily step count during the entire period of lockdown for each country. We performed Wilcoxon signed rank tests on these two features to examine statistically significant differences. The P values were corrected with the number of factors (n=5) and the number of features (n=2) using Benjamini-Yekutieli correction.
Finally, we investigated the effects of different NPIs, in particular immediately after national lockdowns. This was done by comparing the NPIs implemented in the five countries within the first 2 weeks after entering national lockdowns.

Results
Plots showing how the extracted features evolved from February 1, 2019, to July 5, 2020, and box plots of these features are shown in Figures 1-5 and in Figure 6, respectively. Detailed test statistics and P values comparing prelockdown and during lockdown measures are presented in Table 2. Figure 7 shows a zoomed in version of Figures 3 and 4.  Through RADAR-base, we quantified changes in mobility, phone use, and physiological measures as a result of NPIs introduced to control COVID-19. As expected, following national lockdowns, participants in all countries stayed at home for longer, travelled shorter distances, walked less, and had fewer Bluetooth-enabled devices in the vicinity.
In contrast to increased physical distancing (reduced sociability) suggested by these mobility features, higher phone use, indicating compensatory sociability, was observed. Italy, Spain, and the United Kingdom saw longer unlock duration, and these 3 countries together with the Netherlands also showed longer social app use duration. Tellingly, both unlock duration and social app use duration saw peaks around the news of national lockdowns in all countries.
Concurrent with the changes in mobility and phone use, changes in physiological measures were observed. Participants in Spain, Italy, and the United Kingdom went to bed later and slept more. Participants in Spain, Italy, and Denmark also had a decrease in heart rate. Although not statistically significant, an increase in sleep duration and bedtime in Denmark and the Netherlands, and a decrease in heart rate in the United Kingdom and the Netherlands can be seen in Figures 3-5.     The differences across countries existed in the implemented NPIs as well. The requirement of staying at home except for essential trips and the cancellation of public events were implemented in all countries but Denmark where they were only recommended. Working places were required to close for some sectors in Spain, the United Kingdom, and Denmark, and were required to close for all but essential works in the Netherlands and Italy. Public transport was recommended to close in Italy, Spain, and Denmark. Among all countries, Spain had the least strict restrictions on gatherings and school closures (only geographically targeted).
We observed that the young group spent more time at home in Italy, Spain, and the United Kingdom, and degree holders spent more time at home in Italy and Denmark. The young group took fewer daily steps in Italy, the United Kingdom, and the Netherlands; the low BMI group took fewer daily steps in Italy, Spain, Denmark, and the United Kingdom; the young men group took fewer daily steps in Italy, the United Kingdom, and the Netherlands. Participants educated to degree level walked more in the United Kingdom and the Netherlands but less in Italy.
The detailed results are presented in Table 3. Step

Principal Findings
We quantitatively investigated COVID-19 and associated lockdown-related changes in mobility, physiological measures, and phone use features derived from passive data collected through mobile devices (smartphones and wearable Fitbit devices) of participants recruited in five European countries to the RADAR-CNS program. We were able to measure significant changes in behavioral features between baseline, prelockdown, and during lockdown periods. As well as confirming expected changes such as spending more time at home, travelling much less, having far fewer nearby devices, we observed that people were more active on their phones, interacting with others through social apps particularly around major news events such as national lockdown, suggesting physical but maybe less social distancing. Furthermore, participants had lower heart rates, slept more, and went to bed later. In addition, we found that younger people spent more time at home and took fewer daily steps. Participants with lower BMI took more steps while maintaining comparable homestay with the higher BMI group. With 5 billion global smartphone users and 500 million smartwatch and wearable device users [28,29], we propose that the ability to generate metrics such as these is vital for evaluating NPIs efficacy.
Our mobility analyses are in line with Google mobility reports [30], where substantial reductions in mobility and increase in residential stays during lockdown periods were found in Italy, Spain, and the United Kingdom; Denmark and the Netherlands by comparison showed an increase in mobility trends for parks and a relatively small increase in residential stays. However, in comparison to Google mobility reports, which provide valuable aggregated data for short periods, RADAR-base is an open-source highly configurable platform that supports collection and analysis of participant-level mobile and phone data in near real time with a potential for targeted interventions. Specifically, focused test and tracing may be directed to people perceived to be at high risk based on their behavior. In addition, RADAR-base was also used to collect self-reported questionnaires related to emotional well-being, functional status, and disease symptom severity of its participants [19]. Since April 2020, new questionnaires have been distributed to specifically assess COVID-19 symptoms and diagnosis status of our RADAR-CNS research participants. Our future work will use the entirety of these data to investigate the potential of wearable data, such as digital early warning signs of COVID-19, the impact of COVID-19 on the QoL, and the clinical trajectories of their primary diagnosis (MDD or MS).
The difference in the response across nations may reflect differences in the implementation of NPIs, media communication, and cultural differences. Denmark implemented stricter restrictions on working places and public transports but were less strict on homestay and public events [24]. In contrast, Spain was more flexible on restrictions of group gatherings and school closures. The contrast in the implementation of different NPIs between the two countries showing distinct behaviors during lockdown sheds light on which NPIs might be more productive in promoting physical distancing. This shows the potential utility of RADAR-base for remotely monitoring the effect of different NPIs, and we also saw evidence of this in our data with participants beginning to return to prelockdown routines as NPIs were lifted. Future work will compare these differences within a country and across countries, which may further elaborate on the effect and impact of NPIs on infection rates and potential second waves.
It is interesting to note that the younger group in general stayed at home more and took fewer steps than the older group. Since most countries required staying at home except for essential trips, one reason could be that the older adults, often less experienced in using online shopping, had to go out for groceries. This conjecture requires future work to investigate. Those educated to the degree level stayed at home for significantly longer in Italy and Denmark, possibly reflecting higher employment in sectors better able to work from home. The low BMI group took more steps but retained similar homestay to the high BMI group, which suggested they may have found other means to exercise locally. This information helps us understand the effectiveness of the NPIs at a subpopulation level and may be useful in informed strategies for targeted NPIs.
The ability to simultaneously manage multiple data modalities in RADAR-base facilitates the joint analysis and interpretation of them together with NPIs. The decrease in heart rate may be explained by the concurrent reduction in steps, the increase in homestay, and total sleep duration. The reduction in mobility, coupled with an increase in phone use, could possibly serve as indicators of physical distancing observance and resultant compensated social interaction. The delayed bedtime might be related to children homeschooling as a result of school closure, increased phone use, and a lack of exercise. As such, RADAR-base can also be applied to monitor the population health when jointly interpreting features such as step count, sleep duration, and bedtime, which is vital if the social distancing is implemented for a longer duration.
Finally, it has been shown that an elevated resting heart rate may suggest acute respiratory infections [31]. It may be possible to infer one's infection by continuously monitoring heart rate, especially when the population remains indoor for a vast majority of the time. Such monitoring provides the possibility to generate early warning signals for symptomatic or presymptomatic respiratory infections, thereby aiding timely self-isolation or treatment. The COVID-19-related symptom and diagnosis questionnaires have been added to the study and may provide a means to investigate these relationships further.

Media Effects
In addition to changes in trends over longer periods, we also identified interesting findings in relation to specific events (see Figure 7). A dramatic reduction in total sleep duration was observed in Denmark around March 11, 2020, which may be related to the announcement of the pending lockdown on that day and a 185% increase in the confirmed cases in Denmark on the previous day. Another example can be seen just after the mitigation phase was announced in the United Kingdom on March 12, in which social distancing was not strongly recommended, yet we saw participants isolating themselves voluntarily by staying at home for much longer. These observations highlight the potential role of media and social media in the distribution of information that may precipitate certain behavior. This observation may also explain the significant difference between the baseline and prelockdown phases, and suggests that people may have acted ahead of further government restriction. Furthermore, this is accompanied by a marked loss of week day and weekend periodic structure prelockdown and during the lockdown periods.

Limitations
There are some issues to consider concerning this work. First, the participants included in this study have different medical conditions (MDD or MS), which led to different baseline levels across countries. Nevertheless, as the focus of this study is the changes in the prelockdown and during lockdown phases relative to the baseline, we were still able to identify and compare the changes induced by lockdowns. We also analyzed the data collected in Spain split into MDD and MS separately and found the results only differed marginally. Understanding of any artifacts or effects introduced into the RADAR-CNS data by the NPIs will be crucial in RADAR-CNS being able to deliver its aim of identifying signals that predict and prevent MDD and MS. Although the medical conditions of the population in this study might not be fully generalizable to a wider population healthy or with other conditions, we were able to demonstrate the utility of RADAR-base in monitoring behavioral changes, which can be readily generalized to other cohorts.
Second, the individual disease status at baseline may be different from that of the during lockdown period in each country, which might complicate the comparisons. To mitigate this, we used the same time period from the previous year to suppress the seasonal variability. We believe this complication on the population level was unlikely to be large, especially compared to the impact of lockdowns. Third, participants recruited at different times may use different devices for smartphones and Fitbit depending on the availability and enrollment dates, which might make it difficult for interparticipant comparisons. Yet, this work focused on the population-level behavioral changes induced by NPIs where the handset variability was less of a factor.
Fourth, on account of requirements for participants' privacy in the RADAR-CNS studies, location data were purposely obfuscated with a participant-specific random value preventing precise localization of the participants, which limited use of regional geographic factors within a country. It would be interesting to examine how specific regions react to lockdowns when these data are available in future work.
Fifth, limited sample sizes in certain countries and data loss impacted the smoothness of the plots showing how the extracted features evolved over time. The plots for Denmark and the Netherlands showed relatively large variance particularly in the early phase, as these sites have only recently begun recruiting. Several dips and spikes in step counts and heart rate were seen in all countries during July and August. This observation was due to having data loss because of connectivity issues with the Fitbit server during this time.
Last, we only explored a subset of features that can be derived from smartphones and Fitbit wearable devices. Future work will investigate whether other features offer additional information for a more complete description of lifestyle changes.

Conclusions
Using participants' data from smartphones and wearable devices collected and managed by RADAR-base over 1.5 years covering the outbreak and subsequent spread of the COVID-19 pandemic across five European countries, we were able to detect and monitor the physical-behavioral and social-behavioral changes in response to the NPIs. We found that, as well as expected findings (that validated the data collection platform) relating to increased time spent at home, less travel, and fewer nearby Bluetooth-enabled devices, participants were more active on their phone, in particular, interacting with others using social apps, especially around major news events, suggesting increased physical distancing with socialising and interaction moving online. Furthermore, we found that participants had lower heart rates, slept more, and went to bed later. We demonstrated different responses across countries with Denmark showing attenuated responses to NPIs compared to other countries, which may be associated with their different focus of implementation NPIs. We found that younger people stayed at home for longer yet walked less compared to older adults and that the people with lower BMI remained more active during lockdown while having comparable homestay compared to their counterparts with higher BMI. Joint analysis of the extracted features is important for evaluating aspects of NPIs performance during their introduction and any subsequent relaxation of these measures. This work demonstrates the value of RADAR-base for collecting data from wearables and mobile technologies to understand the effect and response of public health interventions implemented in response to infectious outbreaks such as COVID-19. This ability to monitor response to interventions, in near real time, will be particularly important in understanding behavior as social distancing measures are relaxed as part of any COVID-19 exit strategy. Future work will include using participants' responses to COVID-19-related questionnaires together with an expanded feature set to gain more specific understandings into the relationship between mobile devices-derived features and the COVID-19 symptoms.