This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Consumer-grade wearable devices enable detailed recordings of heart rate and step counts in free-living conditions. Recent studies have shown that summary statistics from these wearable recordings have potential uses for longitudinal monitoring of health and disease states. However, the relationship between higher resolution physiological dynamics from wearables and known markers of health and disease remains largely uncharacterized.
We aimed to derive high-resolution digital phenotypes from observational wearable recordings and to examine their associations with modifiable and inherent markers of cardiometabolic disease risk.
We introduced a principled framework to extract interpretable high-resolution phenotypes from wearable data recorded in free-living conditions. The proposed framework standardizes the handling of data irregularities; encodes contextual information regarding the underlying physiological state at any given time; and generates a set of 66 minimally redundant features across active, sedentary, and sleep states. We applied our approach to a multimodal data set, from the SingHEART study (NCT02791152), which comprises heart rate and step count time series from wearables, clinical screening profiles, and whole genome sequences from 692 healthy volunteers. We used machine learning to model nonlinear relationships between the high-resolution phenotypes on the one hand and clinical or genomic risk markers for blood pressure, lipid, weight and sugar abnormalities on the other. For each risk type, we performed model comparisons based on Brier scores to assess the predictive value of high-resolution features over and beyond typical baselines. We also qualitatively characterized the wearable phenotypes for participants who had actualized clinical events.
We found that the high-resolution features have higher predictive value than typical baselines for clinical markers of cardiometabolic disease risk: the best models based on high-resolution features had 17.9% and 7.36% improvement in Brier score over baselines based on age and gender and resting heart rate, respectively (
High-resolution digital phenotypes recorded by consumer wearables in free-living states have the potential to enhance the prediction of cardiometabolic disease risk and could enable more proactive and personalized health management.
The adoption of consumer-grade wearable activity trackers into routine use has been increasing rapidly in recent years, with approximately 1 in 5 adults in the United States reported to regularly use wrist-worn smartwatches and fitness trackers in 2019 [
Previous studies in the cardiometabolic domain have focused on the utility of wearable-derived summary statistics, and fall into 1 of 2 categories. First, electrocardiogram signals from wearables have been studied in relation to the development of cardiometabolic conditions, such as atrial fibrillation [
Rapid and ongoing developments in consumer wearable technology are enabling ever-richer measurements with finer temporal resolution for heart rate, activity, and sleep dynamics in free-living states [
In this study, we aimed to derive high-resolution digital phenotypes from consumer wearable heart rate recordings and to examine their associations with diverse risk markers for cardiometabolic disease. Specifically, we sought to develop a time series feature extraction approach, contextualized by activity state, to meaningfully represent heart rate dynamics recorded by consumer wearables in free-living conditions. We then applied our approach to multidimensional data from normal volunteers in the SingHEART study [
We sourced data from the SingHEART study (NCT02791152) as of October 8, 2019. Enrollment targeted healthy volunteers who provided written informed consent to use the data (including electronic health records) for research. Participants were required to fulfill the inclusion criteria presented in
21-69 years of age
No personal medical history of prior cardiovascular disease (myocardial infarction, coronary artery disease, peripheral arterial disease, stroke), cancer, autoimmune or genetic disease, endocrine disease, diabetes mellitus, psychiatric illness, asthma, chronic lung disease, or chronic infectious disease
No family medical history of cardiomyopathies
At enrollment, each participant was profiled using a range of health assessment modalities. The resulting data set included (1) heart rate and step count time series recordings over 3 to 5 days from consumer wearable devices (Fitbit Charge HR), together with the associated sleep logs generated by Fitbit, (2) self-reported answers to a lifestyle and quality-of-life questionnaire [
Furthermore, we also tracked each participant for the occurrence of any actual clinical event. We extracted all clinical codes (based on the International Classification of Diseases, 10th Revision) pertaining to any acute care use events in the regional health system associated with the National Heart Centre Singapore until January 2021 to characterize the links among data features, risk markers, and actual clinical events.
Summary of demographic, clinical, and consumer wearable data for participants with wearable recordings (N=692) in the SingHEART study cohort.
|
Female (n=370, 53.5%) | Male (n=322, 46.5%) | |||
|
Value, mean (SD) | Participants, na (%) | Value, mean (SD) | Participants, na (%) | |
Age (years) | 45.47 (11.71) | 0 (0) | 44.46 (13.29) | 0 (0) | |
BMI (kg/m2) | 22.87 (3.94) | 0 (0) | 24.33 (3.39) | 0 (0) | |
WCb (cm) | 78.91 (10.98) | 0 (0) | 86.96 (9.86) | 0 (0) | |
SBPc (mm Hg) | 122.51 (17.74) | 0 (0) | 132.20 (14.96) | 0 (0) | |
DBPd (mm Hg) | 73.38 (12.80) | 0 (0) | 82.18 (10.97) | 1 (0.3) | |
Wearable-derived resting heart rate (bpm; Fitbit) | 70.66 (6.55) | 0 (0) | 69.37 (6.59) | 0 (0) | |
ECG_HRe (bpm) | 64.46 (9.17) | 10 (2.7) | 63.67 (9.87) | 12 (3.7) | |
Total cholesterol (mmol/L) | 5.34 (0.94) | 6 (1.6) | 5.33 (0.97) | 5 (1.6) | |
LDLf (mmol/L) | 3.32 (0.81) | 7 (1.9) | 3.40 (0.89) | 6 (1.9) | |
HDLg (mmol/L) | 1.59 (0.32) | 6 (1.6) | 1.36 (0.30) | 5 (1.6) | |
TGsh (mmol/L) | 0.99 (0.51) | 6 (1.6) | 1.30 (0.76) | 5 (1.6) | |
Glucose (mmol/L) | 5.17 (0.49) | 8 (2.2) | 5.36 (0.71) | 5 (1.6) | |
Average daily step counti | 10,349.81 (4180.35) | 30 (8.1) | 10,972.86 (3919.10) | 20 (6.2) | |
Average daily sedentary minutes | 633.45 (96.48) | 102 (27.6) | 656.49 (95.58) | 88 (27.3) | |
Average daily sleep minutes | 395.92 (61.18) | 102 (27.6) | 374.49 (65.15) | 88 (27.3) |
aRefers to number of participants with missing or incomplete values for the respective fields.
bWC: waist circumference.
cSBP: systolic blood pressure.
dDBP: diastolic blood pressure.
eECG_HR: electrocardiogram heart rate.
fLDL: low-density lipoprotein.
gHDL: high-density lipoprotein.
hTG: triglyceride.
iThe average daily step count was derived by taking the sum of steps for each day and then averaging over days. Only days with ≥20 hours of valid data were considered.
The SingHEART study (NCT02791152) was established at the National Heart Centre Singapore, a tertiary specialty hospital in Singapore, and was approved by the SingHealth Centralized Institutional Review Board (ref: 2015/2601 and 2018/3081) [
Given a time series segment, it is possible to define a set of high-resolution features using approaches such as the highly comparative time series analysis [
The Catch22 features fall into seven main categories, namely (1) distribution, (2) extreme events, (3) symbolic, (4) linear autocorrelation and periodicity, (5) nonlinear autocorrelation, (6) successive differences, and (7) fluctuation analysis. The distribution-based features represent summary statistics of the distribution of the measured values in the series (while ignoring the chronological order of these values). The extreme event features represent intervals between successive outlier events in the time series. The symbolic features represent statistics summarizing the outputs of symbolic transformations of the actual time series values. The linear autocorrelation and periodicity features comprise summary statistics on inherent periodicities in the time series. The nonlinear autocorrelation features involve summary statistics on periodicities based on nonlinear transformations of the time series. The successive difference features represent statistics based on the time series of the incremental differences. Finally, the fluctuation analysis features quantify the statistical self-affinity of the time series. Detailed descriptions of each of the 22 features are provided in Table S1 in
We now describe the steps to derive resting heart rate, summary statistics on activity and sleep patterns, and high-resolution features from the wearable heart rate and step count time series recordings. As all these physiological features are derived from the same recordings, they are internally consistent and can be meaningfully used for downstream comparative analyses.
We used wearable heart rate time series recordings to derive resting heart rate [
We extracted the wearable time series recordings for each participant and used only days with at least 20 hours of step count and heart rate data as per Lim et al [
Subsequently, we processed the heart rate and step count time series recordings from the consumer wearable devices to yield a range of summary and high-resolution features, as detailed in subsections
Wearable data processing pipeline. (A) Construction of low-resolution features based on summary statistics. (B) Construction of high-resolution features based on the Canonical Time-series Characteristics 22 (Catch22) algorithm. (C) UpSet plot of the 692 participants with features from the various categories. Only nonempty set intersections are presented. Intersection size indicates the number of participants found within the intersections of given sets. Of the largest intersection with 328 participants, 321 also had laboratory measurement recordings.
We used a 3-step procedure to derive a range of wearable summary statistics (
We further developed a data processing pipeline to extract high-resolution time series features from heart rate recordings of the wearable device (
For each participant, we chose the longest uninterrupted period of the heart rate time series recordings for each physical activity state. As the data exhibit significant variability in the lengths of these periods across participants, we defined prespecified lengths to extract standardized sleep, sedentary, and active segments. Specifically, we extracted the first 20 minutes for active segments, the first 1 hour for sedentary segments, and the first 5 hours for sleep segments. If the recordings available for a participant did not fulfill the prespecified length criteria, even with the longest segment for a given activity state, we did not consider that particular activity state for high-resolution analyses. This process yielded up to 3 heart rate time series segments for each participant.
For each available heart rate time series segment, we applied the Catch22 methodology [
As our study did not prescribe controlled experimental settings for the wearable recordings, the resulting time series segments often exhibit significant noise and irregularities. Hence, we considered the reliability of our feature representation approach in these real-world settings. In particular, we assessed stability and sensitivity of the Catch22 features to the length specifications across activity states (Section SI-1,
We examined how high-resolution wearable-derived heart rate features from sleep, active, and sedentary segments were distributed across study participants.
The first example comprises a nonlinear autocorrelation feature (CO_trev1_num
Illustration of wearable-derived high-resolution heart rate features. The distributions of 6 high-resolution features from the 321 participants, based on 2 Canonical Time-series Characteristics 22 features obtained from time series recordings in each of the 3 activity levels. The selected participants are at the 2.5th, 25th, 50th, 75th and 97.5th percentiles of each distribution, and the time series for the participant is plotted in the corresponding color. (A-C) CO_trev1_num is the time-reversibility statistic; higher values tend to correspond to “spikier” or irregular time series. (D-F) DN_HistogramMode_5 takes a time series and groups the z-scored values into 5 linearly spaced bins and reports the mode of the bins.
The overall approach used to characterize the predictive value of different wearable-derived features with respect to a variety of clinical risk markers is as follows. Specifically, we considered model types based on 6 different feature sets (
All 321 participants who had a complete set of wearable-derived features also had complete data for the 9 laboratory measurements. We considered this set of 321 participants as our training set to model the clinical risk targets. Of these 321 participants, 149 (46.4%) were not positive for any of the 4 risk markers, whereas 172 (53.5%) were positive for at least one risk marker (Section SI-2,
Description of the different model types.
Model name | Features included | Features, n |
Baseline [ |
Age+gender | 2 |
RestingHR | Baseline features+wearable-derived resting heart rate | 3 |
SummaryStats | Baseline features+wearable summary stats | 12 |
HighRes.ActiveSeg | Baseline features+Catch22a (active) | 24 |
HighRes.SedenSeg | Baseline features+Catch22 (sedentary) | 24 |
HighRes.SleepSeg | Baseline features+Catch22 (sleep) | 24 |
aCatch22: Canonical Time-series Characteristics 22.
Laboratory measurements and corresponding thresholds.
Laboratory measurement | Threshold to be considered at risk | |
I. Systolic blood pressure (mm Hg) | >140 | |
II. Diastolic blood pressure (mm Hg) | >90 | |
III. Triglycerides (mmol/L) | >2.3 | |
IV. Total cholesterol (mmol/L) | >6.2 | |
V. HDLa (mmol/L) | <1 | |
VI. LDLb (mmol/L) | >4.1 | |
VII. Fasting blood glucose level (mmol/L) | >6 | |
|
||
|
Male | >100 |
|
Female | >90 |
IX. BMI (kg/m2) | >27.5 |
aHDL: high-density lipoprotein.
bLDL: low-density lipoprotein.
We used machine learning to model the complex nonlinear relationships between a given feature set and the target pairing using 2 separate approaches. First, for any given target, we analyzed the predictive value of different feature sets (
We trained machine learning models to estimate the probability that a participant exhibits clinical risk markers for common cardiometabolic disease abnormalities. Specifically, we used random forest classifiers [
For random forests, variable importance can be quantified using the mean decrease in accuracy (MDA) over all OOB cross-validated predictions. To obtain statistically robust estimates of variable importance, for a given prediction target, we averaged the MDA for each feature across the 200 random forests and then ranked the features by their average MDA to obtain the top 10 important features. To visualize the variable importance results, we considered the union of the top 10 ranking features for the 4 cardiometabolic disease risk targets.
As the risk prediction task is inherently probabilistic, a suitable metric for model performance assessment would emphasize the calibration of the model predictions (ie, the prediction probabilities of true positives and true negatives are close to 1 and 0, respectively). Therefore, we evaluated the accuracy of probabilistic predictions using the Brier score [
where
We used OOB estimates [
For each target, we also compared the performance of the various model types in relation to each other. Specifically, for each pair of model types, we performed a 2-tailed Welch
To better understand the nature of wearable-derived time series features, we investigated their associations with genomic risk markers for cardiometabolic disease. As probing these associations requires handling diverse multidimensional data types with potentially complex nonlinear relationships, we used a machine learning framework (similar to the one described earlier) to model these relationships. We then used model performance measures to infer the degree of information overlap between wearable features and genomic risk targets. As genomic risk is independent of age, we did not include age in any of the models considered.
We categorized the genetic susceptibility to cardiometabolic diseases using polygenic scores (PGSs). To define the genomic risk for lipid abnormalities, blood pressure abnormalities, and obesity, we used the PGS Catalog [
For each of the 3 targets, we labeled a participant as having high genomic risk if their scores for any of the relevant PGS were in the top or bottom decile (refer to Section SI-3,
To evaluate the sensitivity of the chosen percentile cut-offs for genomic risk scores, we repeated the above analyses for 2 additional sets of cut-offs, namely the 80/20 and 85/15 cut-offs.
Finally, we examined the connections between high-resolution wearable-derived features and actualized cardiometabolic disease events for participants not in our training set of 321. Among these participants, we considered those who actualized cardiometabolic disease events indicated by a primary diagnosis of cardiovascular disease, dyslipidemia, and hypertension (as per International Classification of Diseases, 10th Revision codes listed in Table S3 in
For participants selected per the abovementioned criteria, we examined demographic information, physical measurements, genomic risk of disease, and clinical risk markers alongside the wearable-derived features. To interpret how the different wearable-derived features contribute to the model predictions at the individual participant level, we computed the Shapley values (Φ) [
All statistical analyses and modeling were performed using R Statistical Software (version 4.0.3; R Core Team 2020). Computation of resting heart rate was performed using R, but all other feature engineering efforts such as annotation of wearable time series recordings and derivation of summary features, as well as the generation of high-resolution features, were performed using Python (version 3.8.6).
All Python and R codes used in feature generation are available in
Unlike summary statistics such as resting heart rate, which averages heart rate measurements across multiple days, our high-resolution feature sets provide more granularity on the heart rate time series dynamics during different physical activity states (sleep, active, and sedentary).
High-resolution (Canonical Time-series Characteristics 22 [Catch22]) wearable features from 3 different activity states. (A) Frequency polygons of the feature values based on the training set. The colors indicate activity states. (B) Pearson correlation coefficients between pairs of Catch22 features from different physical activity states (sleep, active, and sedentary). Two features from the active period (SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1 and SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1) are uniformly 0; hence, correlation coefficients involving these 2 features are undefined (white squares).
To study whether this difference holds at the participant level, we characterized the correlations among the high-resolution feature sets obtained during the 3 different activity states. For any given feature (eg, CO_trev1_num), we considered vectors of feature values for each physical activity state across the population (eg, CO_trev1_num.active, CO_trev1_num.sedentary, and CO_trev1_num.sleep). We then calculated the Pearson correlation between these feature vectors for each pair of the activity states. This analysis revealed that the feature values from the different activity states were poorly correlated (
Having gained some intuition about the information contained within the wearable-derived feature sets, we considered their predictive value for the clinical markers of cardiometabolic disease risk. Specifically, we trained random forest models to use the different wearable-derived feature sets to classify each of the 4 cardiometabolic disease risk targets. We performed comparative analyses to evaluate the predictive value of the different wearable-derived feature sets for classification of the 4 cardiometabolic disease risk targets.
First, we compared the OOB performance of the models trained using different feature sets for each clinical risk marker target (
Second, we observed that heart rate dynamics extracted from different activity level segments have differential predictive potential for the various targets, as evidenced by the statistically significant differences between Brier scores (
Third, to comparatively evaluate contributions from individual wearable-derived features, we trained models that used all features available to predict each cardiometabolic disease risk target and ranked the variable importance in each case.
Fourth, we observed that the top 10 features for each of the 4 targets included features from all 6 feature types (age and gender, wearable-derived resting heart rate, wearable summary statistics, and the 3 sets of high-resolution features from
Model performance on cardiometabolic risk targets. Out-of-bag model performance for each of the 5 model types computed for the 4 targets. A smaller Brier score indicates a better performing model for a given target.
|
Baselinea, mean (SD) | RestingHRb, mean (SD) | HighRes.ActiveSegc, mean (SD) | HighRes.SedenSegc, mean (SD) | HighRes.SleepSegc, mean (SD) | SummaryStats, mean (SD) |
anyRISKoutof9 | 0.291 (−5.87×10−4) | 0.258 (7.7×10−4) | 0.253 (8.52×10−4) | 0.239 (−9×10−4) | 0.245 (8.43×10−4) | 0.247 (7.66×10−4) |
bp_abnormal | 0.227 (4.79×10−4) | 0.223 (5.61×10−4) | 0.217 (7.88×10−4) | 0.222 (8.14×10−4) | 0.225 (8.32×10−4) | 0.225 (7.9×10−4) |
obesity | 0.246 (6.64×10−4) | 0.227 (7.91×10−4) | 0.221 (8.92×10−4) | 0.214 (9.34×10−4) | 0.226 (8.64×10−4) | 0.227 (8.54×10−4) |
lipids_abnormal | 0.271 (5.84×10−4) | 0.261 (6.64×10−4) | 0.238 (8.08×10−4) | 0.225 (7.58×10−4) | 0.241 (8.27×10−4) | 0.236 (7.3×10−4) |
aFor each risk target, the Brier scores of the baseline model were significantly different from those of all other models (
bFor each risk target, Brier scores of the resting heart rate model (RestingHR) were significantly different from all other models (
cFor each risk target, Brier scores of the 3 HighRes models were significantly different from each other (
Random forest variable importance. The variable importance of each feature for prediction of the 4 cardiometabolic disease risk targets. We averaged each importance value across 200 simulations and used the results to rank the top 10 features to retain for each cardiometabolic disease risk target. This resulted in a total of 26 features across all 4 targets, as shown in the figure. Catch22: Canonical Time-series Characteristics 22.
To further interpret the information contained within the wearable-derived features, we sought to understand how they relate to the genetic predispositions for cardiometabolic diseases. Specifically, we examined the degree of information overlap between the different wearable-derived features (
The results are presented in
Degree of association with genomic risk targets. Out-of-bag performance for each of the 5 model types computed for the 3 targets. A smaller Brier score indicates better performing model for a given target.
|
Baselinea, mean (SD) | RestingHRb, mean (SD) | HighRes.ActiveSeg, mean (SD) | HighRes.SedenSeg, mean (SD) | HighRes.SleepSeg, mean (SD) | SummaryStats, mean (SD) |
Blood pressure | 0.248 (2.0×10−3) | 0.245 (8.55×10−4) | 0.215 (1.08×10−3) | 0.214 (1.09×10−3) | 0.215 (9.93×10−4) | 0.212 (9.64×10−4) |
Obesity | 0.245 (2.31×10−3) | 0.246 (9.03×10−4) | 0.205 (1.15×10−3) | 0.192 (1.06×10−3) | 0.199 (1.21×10−3) | 0.203 (1.06×10−3) |
Lipids | 0.294 (3.02×10−3) | 0.308 (6.36×10−4) | 0.254 (9.07×10−4) | 0.254 (8.82×10−4) | 0.259 (8.92×10−4) | 0.268 (8.86×10−4) |
aFor each risk target, the Brier scores of the baseline model were significantly different from all other models (
bFor each risk target, Brier scores of the resting heart rate model (RestingHR) were significantly different from those of the 3 HighRes and SummaryStats models (
Finally, we examined the relationship between the wearable-derived feature set most predictive for anyRISKoutof9 and actualized cardiometabolic events. We focused on participants not in our training set and filtered participants with data for the feature set most predictive for anyRISKoutof9 (ie, Catch22 [Sedentary] feature set, based on the abovementioned results). This yielded 197 candidate participants for illustrative profiling. Among these participants, only 5 participants actualized events with primary diagnoses for cardiometabolic conditions (as specified in Table S3 in
First, we describe participants with abnormalities in both genetic and clinical risk markers, namely participants A and B. Participant A had high genomic risk for all 3 conditions, presented abnormal values for most of the 9 clinical risk markers, and was also diagnosed with all 3 types of cardiometabolic conditions considered (cardiovascular disease, dyslipidemia, and hypertension). Participant B had a genomic risk for lipid and blood pressure abnormalities, abnormal lipid panel values, and a clinical diagnosis of dyslipidemia. While participant A had a wearable-derived resting heart rate slightly above the population average, participant B had a wearable-derived resting heart rate lower than the population average. However, in both cases, our HighRes.SedenSeg model predicted a positive anyRISKoutof9 outcome.
Second, we considered participants with no genomic risk but who presented with abnormal clinical risk markers, namely participant C. This participant had high blood pressure, abnormal cholesterol and blood glucose levels, a clinical diagnosis of dyslipidemia, and wearable-derived resting heart rate slightly above the population average value. However, we noted that our HighRes.SedenSeg model predicted a negative anyRISKoutof9 outcome. This could be due to modeling error or possibly be attributed to the absence of severe changes in heart rate dynamics given the normal genetic background and moderate wearable-derived resting heart rate value.
Third, we highlighted participants who did not exhibit any abnormalities in clinical risk markers and were borderline for cardiometabolic disease risk, namely participants D and E. Participant D only had a genomic risk for blood pressure. Participant E, on the other hand, appeared to have the most benign profile with low genomic risk for all 3 target conditions and normal values for all 9 clinical risk markers (with only the BMI being borderline high). Both participants had wearable-derived resting heart rate values that were lower than the population average. Although participants D and E had a seemingly low-risk profile by standard measures, they had clinical diagnoses of dyslipidemia and cardiovascular disease, respectively. Indeed, our HighRes.SedenSeg model predicted a positive anyRISKoutof9 outcome in each case.
Finally, inspecting the most important features (top 5 Shapley values) contributing to model predictions for anyRISKoutof9 in
Illustrative profiles of 5 participants with actualized cardiometabolic events. Participant profiles include demographic information, type of cardiometabolic disease, key physical measurements, clinical and genomic risk markers, and the top 5 important wearable-derived heart rate features (as per Shapley values).
Participant profiles | Participant | |||||
|
A | B | C | D | E | |
|
||||||
|
Age (years) | 54 | 57 | 56 | 55 | 61 |
|
Gender | Male | Male | Male | Female | Male |
|
Wearable-derived resting heart rate | 72.8 | 58.2 | 73.0 | 69.0 | 55.7 |
|
||||||
|
BMI (kg/m2) | 28.05 | 18.79 | 21.27 | 22.95 | 25.95 |
|
Blood pressure: SBPa/DBPb (mm Hg) | 166/109 | 108/65 | 164/105 | 112/48 | 133/89 |
|
Glucose (mmol/L) | 6.8 | 4.8 | 7.4 | 5.3 | 5.3 |
|
Total cholesterol (mmol/L) | 5.27 | 6.63 | 6.60 | 5.05 | 4.45 |
|
anyRISKoutof9 | Truec | True | True | Falsed | False |
|
||||||
|
Lipids abnormalities | True | True | False | False | False |
|
Blood pressure abnormalities | True | True | False | True | False |
|
Obesity | True | False | False | False | False |
|
||||||
|
Cardiovascular disease | True | True | False | False | True |
|
Dyslipidemia | True | False | True | True | False |
|
Hypertension | True | False | False | False | False |
|
||||||
|
CO_f1ecac.sedentary | False | False | False | True | False |
|
FC_LocalSimple_mean3_stderr.sedentary | True | False | False | False | False |
|
SB_MotifThree_quantile_hh.sedentary | True | False | False | False | False |
|
SB_TransitionMatrix_3ac_sumdiagcov.sedentary | False | False | False | True | False |
|
CO_trev_1_num.sedentary | False | False | False | False | True |
|
CO_HistogramAMI_even_2_5.sedentary | False | False | True | False | False |
|
DN_OutlierInclude_p_001_mdrmd.sedentary | True | True | False | False | False |
|
CO_Embed2_Dist_tau_d_expfit_meandiff.sedentary | False | True | False | True | True |
|
DN_HistogramMode_10.sedentary | False | False | True | False | False |
|
DN_HistogramMode_5.sedentary | True | True | True | True | True |
|
Gender | True | True | True | False | True |
|
Age (years) | False | True | True | True | True |
aSBP: systolic blood pressure.
bDBP: diastolic blood pressure.
cTrue indicates true or that there is a presence of categorical variables.
dFalse indicates false or absence of categorical variables.
Consumer wearables enable the recording of rich high-resolution physiological dynamics in free-living conditions, but how these data relate to health and disease is not fully understood. We introduced a principled framework to derive high-resolution heart rate features from consumer wearable recordings, and applied our approach to a data set containing multidimensional cardiometabolic health parameters from healthy volunteers. Our results show that, in comparison with typical summary statistics, high-resolution features resolving temporal dynamics and activity-dependent patterns in heart rate have stronger associations with modifiable risk markers and inherent genetic predispositions for cardiometabolic disease alike. Our findings imply that these high-resolution digital phenotypes from consumer wearables can provide a more granular picture of cardiometabolic health and disease states, which could have potential use in cardiometabolic health screening and disease management.
Our framework addresses key challenges in mining wearable data recorded in free-living conditions. Unlike clean data from controlled experimental settings, real-world wearable recordings tend to be irregular, contain missing stretches [
Our framework provides many possibilities for gaining new insights from wearable recordings. Our analyses, using multimodal wearable, genomic, and clinical data from healthy volunteers, highlight 2 possibilities.
First, our results revealed new relationships between high-resolution heart rate dynamics from wearables and the risk of cardiometabolic disease. Most previous studies correlated clinically obtained measures of heart rate dynamics, such as heart rate variability, exercise capacity, and heart rate recovery, with disease risk or outcomes [
Second, our study provides new perspectives on the interrelations between wearable recordings and genetic predispositions in cardiometabolic diseases. Although there has been a longstanding interest in probing gene-lifestyle interactions and their additive effects on cardiovascular disease [
Although the uniquely multimodal nature of our data enables us to uncover many novel insights on high-resolution wearable phenotypes, limitations of data set size and cohort design present some challenges. First, it was infeasible to conduct full-scale gene-environment interaction studies [
In conclusion, we demonstrated that high-resolution digital phenotypes based on heart rate patterns in wearable recordings provide important insights into physiology in free-living conditions. Our results revealed that these measures are associated with both genetic and clinical risk markers of cardiometabolic disease and have additional predictive value beyond wearable-derived summary statistics and clinical measures of cardiometabolic health. Hence, our work expands possibilities to use digital phenotypes from consumer wearables as readily accessible indicators of cardiometabolic health and disease and motivates new approaches for quantitative scoring of cardiometabolic disease risk. Future studies could expand our findings to even higher resolution digital phenotypes that can be extracted from recordings with newer generations of wearable devices [
Supplementary information.
Supplementary data: code for feature generation.
Canonical Time-series Characteristics 22
mean decrease in accuracy
out-of-bag
polygenic score
Shapley Additive Explanations
This research was supported by funding and infrastructure from the Singapore National Precision Medicine Program (IAF-PP H17/01/a0/007) and the Institute for Infocomm Research, A*STAR. Data acquisition was supported in part by funding from SingHealth, Duke-NUS Medical School, National Heart Centre Singapore, Singapore National Medical Research Council (NMRC/STaR/0011/2012, NMRC/STaR/0026/2015), Lee Foundation, and the Tanoto Foundation.
The authors would like to thank all the volunteers for their participation in this study. They acknowledge valuable data collection assistance from the National Heart Centre Singapore and SingHEART Clinical Research Coordinators and resources from the National Supercomputing Center, Singapore [
JZ was affiliated with the Institute of Infocomm Research at the time of his contribution to this work, and is currently affiliated with the Diagnostics Development Hub (DxD Hub) at the Agency for Science Technology and Research (A*STAR).
WZ, YEC, CSF, PT, WKL, PK conceived the study. WKL and PK supervised the research. PT, KKY, WKL, and PK acquired funding. JXT, SD, WH, JY, SC, PT, CWC, KKY, WKL performed data acquisition and data curation. WZ, YEC, CSF, JZ, PK developed the analysis methodology. WZ, YEC, JZ wrote software, performed data analysis and visualization. WZ and PK led the manuscript writing, with critical inputs from YEC, CSF, and WKL. All authors interpreted the findings, reviewed, and approved the final manuscript. WKL and PKS are the corresponding authors of this study, and can be reached by email at wengkhong.lim@duke-nus.edu.sg and pavitrak@i2r.a-star.edu.sg respectively.
None declared.