This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Pancreatic cancer is the third leading cause of cancer-related deaths, and although pancreatectomy is currently the only curative treatment, it is associated with significant morbidity.
The objective of this study was to evaluate the utility of wearable telemonitoring technologies to predict treatment outcomes using patient activity metrics and machine learning.
In this prospective, single-center, single-cohort study, patients scheduled for pancreatectomy were provided with a wearable telemonitoring device to be worn prior to surgery. Patient clinical data were collected and all patients were evaluated using the American College of Surgeons National Surgical Quality Improvement Program surgical risk calculator (ACS-NSQIP SRC). Machine learning models were developed to predict whether patients would have a textbook outcome and compared with the ACS-NSQIP SRC using area under the receiver operating characteristic (AUROC) curves.
Between February 2019 and February 2020, 48 patients completed the study. Patient activity metrics were collected over an average of 27.8 days before surgery. Patients took an average of 4162.1 (SD 4052.6) steps per day and had an average heart rate of 75.6 (SD 14.8) beats per minute. Twenty-eight (58%) patients had a textbook outcome after pancreatectomy. The group of 20 (42%) patients who did not have a textbook outcome included 14 patients with severe complications and 11 patients requiring readmission. The ACS-NSQIP SRC had an AUROC curve of 0.6333 to predict failure to achieve a textbook outcome, while our model combining patient clinical characteristics and patient activity data achieved the highest performance with an AUROC curve of 0.7875.
Machine learning models outperformed ACS-NSQIP SRC estimates in predicting textbook outcomes after pancreatectomy. The highest performance was observed when machine learning models incorporated patient clinical characteristics and activity metrics.
Pancreatectomy is a particularly complex operation with a 90-day mortality rate over 4% and serious morbidity rates over 20%, even in high-volume centers [
Patients undergoing pancreatectomy have an increased risk of postoperative complications if they have poor preoperative physical health and overall performance [
Recently published data have demonstrated that telemonitoring using wearable devices with a 3-axis accelerometer and photoplethysmogram sensors can provide real-time data on patient activity metrics, which can holistically capture a patient’s physical health status [
For patients undergoing pancreatectomy, this technology has the potential to improve patient selection. To evaluate the relationship between longitudinal patient activity bioinformatics and their effect on surgical outcomes, our team implemented a protocol in which we provided patients with wearable telemonitoring devices before undergoing pancreatectomy at our institution and evaluated predictive outcomes. Herein, we present a prospective cohort study of patients undergoing pancreatectomy over a 12-month period.
From February 2019 to February 2020, eligible patients were recruited from multidisciplinary pancreas clinics. Both men and women and members of all races and ethnic groups were eligible for this trial. The inclusion criteria for our study included patients who (1) were scheduled to undergo pancreatic resection, (2) had access to a smartphone, (3) were at least 18 years of age, and (4) were able to understand and willing to sign an institutional review board (IRB)–approved informed consent document (IRB #201810002).
We conducted a prospective, single-center, single-cohort trial evaluating the utility of telemonitoring devices to measure daily activity in patients undergoing pancreatectomy. The device used in this study was the Fitbit Inspire HR (Fitbit, Inc), which was selected because it provides remote data access from the device with a set frequency and enhanced granularity. It is also a waterproof, inexpensive, consumer-based device and designed to be compatible with most smartphones. At the time of consent, study patients were provided with a telemonitoring device and assisted in setting it up with their smartphone. Pancreatectomy typically took place more than two weeks after surgical consent, providing a minimum of two weeks of preoperative activity metric data. All clinical practices followed the standard of care.
Our team developed software to remotely collect activity metrics from our patient telemonitoring devices that was compliant with the Health Insurance Portability and Accountability Act. This platform collected real-time patient data with 1-minute granularity. In cases of a lost connection, the wearable device saved up to 7 days of minute-to-minute activity metrics as well as accessory data (eg, battery life at last sync and time of last sync). Our informatics system performed daily audits and ran a weekly summary routine to provide the study team with the previous week’s data, including yield. Yield was tracked using the total number of heart rate data points obtained during the day as a proxy for the percentage of the day the patient was wearing the device properly.
Patient clinical characteristics were collected, including demographics, comorbidities, and clinical presentation. ACS-NSQIP SRC risk calculations were evaluated and documented.
All outcome measurements were prospectively collected by the study team and recorded in the patient’s secure study record. All postoperative complications were coded and graded using the Modified Accordion Grading System (MAGS) [
To construct machine learning models based on activity metrics data, we applied feature engineering techniques to extract three types of features: statistical, semantic, and biobehavioral rhythmic features. We extracted first- and second-order statistical features from the daily step count, heart rate, and sleep time-series data [
To account for variation in the study participation period (ie, time to surgery), the extracted patient activity features were unified to consistent dimensions. Biobehavioral rhythmic features were computed for the entire study participation period, and the statistical and semantic features were generated daily. In order to eliminate varying input feature dimension caused by different lengths of monitoring periods, we used mean and variance of the statistical and semantic features of a participant as the final inputs to the machine learning models.
Multiple machine learning models were developed, trained, and evaluated for their ability to predict outcomes by discovering complex underlying patterns from multimodal time-series patient activity data collected from wearable devices and patient clinical characteristics. To avoid overfitting, we performed state-of-the-art “shallow” machine learning models, including random forest, gradient boosted trees (GBT), k-nearest neighbors (KNN), support vector machine (SVM) with linear kernel, and logistic regression (LR) with L1 penalty. A GBT model is an ensemble of weak decision trees that classifies the samples based on the predictions of those trees [
Leave-one-subject-out cross-validation (LOSO CV) was used for calculating the performance metrics, such as area under the receiver operating characteristic (AUROC), sensitivity, specificity, precision, and F1 score. LOSO CV was able to evaluate the model’s performance on unseen patients, namely the out-of-sample accuracy [
There were three possible causes of missing data: (1) improper wearing of the device, (2) lack of user compliance (not wearing the device), and (3) loss of connectivity for longer than 7 days. For patients with missing data, we applied a two-level imputation method to the activity metrics collected by our telemonitoring devices [
To evaluate the effectiveness of the machine learning models in predicting postoperative outcomes, defined by the modified textbook outcome, we compared them with clinical patient performance status assessment tools, including the ACS-NSQIP SRC. Utilizing the ACS-NSQIP SRC as our baseline model, we evaluated the performance and efficacy of this approach and applied machine learning models to (1) patient clinical characteristics (demographics, comorbidities, and clinical presentation), (2) features derived from remotely collected activity metrics, and (3) patient clinical characteristics + features derived from remotely collected activity metrics. The comparative evaluation of the “patient activity–only” and “clinical characteristic–only” models assessed the predictive power of activity metrics, while the performance of a combined “patient activity + clinical characteristic” model, by design, tested whether activity metrics and clinical records complement each other to yield better results.
A total of 54 patients were enrolled in the study, and 48 patients completed it. Four patients had their pancreatectomy cancelled on the day of surgery because of intraoperative evidence of advanced disease, and 2 patients electively chose to withdraw for nonmedical reasons. All patients had an independent functional status. Of the 48 patients who completed the study, 29 (60%) were females and 19 (40%) were males, with an average age of 63.2 (SD 11.6) years. Patients underwent three different types of pancreatectomy, including pancreaticoduodenectomy (n=41, 85%), distal pancreatectomy (n=6, 13%), and total pancreatectomy (n=1, 2%). The surgeries were performed open in 28 (58%) cases and minimally invasively in 20 (42%) cases. Final surgical pathology was adenocarcinoma (n=36, 75%), neuroendocrine (n=7, 15%), benign disease (n=4, 8%), and metastatic renal cell carcinoma (n=1, 2%).
In our cohort, 28 (58%) patients had a textbook outcome, with the other 20 (42%) patients not achieving a textbook outcome. Fourteen patients developed 19 severe complications (MAGS score ≥3), including delayed gastric emptying (n=3), pancreatic fistula (n=3), organ space infection (n=2), postpancreatectomy hemorrhage (n=4), nonpancreatic anastomotic leak (n=1), myocardial infarction (n=1), and other (n=5). Additionally, 11 patients required readmission to the hospital. See
Patient characteristics.
Characteristic | Patients with complications (n=20) | Patients with textbook outcomes (n=28) | ||
Age (years), mean (range) | 67.24 (48.14-80.52) | 60.26 (31.02-84.02) | .04 | |
|
|
|
.12 | |
|
Male | 11 (55) | 8 (29) |
|
|
Female | 9 (45) | 20 (71) |
|
|
|
|
.86 | |
|
White | 19 (95) | 25 (89) |
|
|
Non-White | 1 (5) | 3 (11) |
|
|
|
|
.06 | |
|
≥5 | 12 (60) | 8 (29) |
|
|
<5 | 8 (40) | 20 (71) |
|
|
|
|
.45 | |
|
Never smoked | 11 (55) | 19 (68) |
|
|
Active smoker with >10 pack years | 1 (5) | 3 (11) |
|
|
Active smoker with <10 pack years | 0 | 1 (3.5) |
|
|
Past history of smoking with >30 pack years | 7 (35) | 4 (14) |
|
|
Past history of smoking with <30 pack years | 1 (5) | 1 (3.5) |
|
|
|
|
.48 | |
|
≥5 | 7 (35) | 6 (21) |
|
|
<5 | 13 (65) | 22 (79) |
|
|
|
|
.07 | |
|
1 | 0 | 1 (3.6) |
|
|
2 | 7 (35) | 18 (64.3) |
|
|
3 | 13 (65) | 9 (32.1) |
|
BMI (kg/m2), mean (range) | 27.99 (20.30-37.00) | 29.03 (19.00-48.07) | .59 | |
|
|
|
.02 | |
|
Yes | 15 (75) | 10 (36) |
|
|
No | 5 (25) | 18 (64) |
|
|
|
|
.38 | |
|
Open | 14 (70) | 14 (50) |
|
|
Laparoscopic | 4 (20) | 9 (32) |
|
|
Robotic | 2 (10) | 5 (18) |
|
|
|
|
.22 | |
|
Pancreaticoduodenectomy | 18 (90) | 23 (82) |
|
|
Distal pancreatectomy | 1 (5) | 5 (18) |
|
|
Total pancreatectomy | 1 (5) | 0 (0) |
|
a
bASA: American Society of Anesthesiologists.
Patient activity metrics were collected over an average of 25.9 days (range 6 to 153 days) before surgery. The average daily yield of all patients, defined as the fraction of expected heart rate readings per minute that were successfully collected in a day, was 82.1% (SD 23.5%). High data availability was defined as days with a yield greater than or equal to 50%. Based on this, the average number of days per patient with high data availability was 19 (range 2 to 102) and the average percentage of days with high data availability per patient was 79.8% (range 14.8% to 100%). Patients took on average of 4162.1 (SD 4052.6) steps per day, had an average heart rate of 75.6 (SD 14.8) beats per minute, and had an average sleep time series of 2 (SD 1), which was a mean DFA of their sleep stages with 50-minute windows. The average ACS-NSQIP SRC calculations for a patient developing any complication was 27.3% (SD 6.4%), developing a serious complication was 23.3% (SD 5.5%), and being readmitted was 15.1% (SD 3.4%).
Utilizing the ACS-NSQIP SRC as our baseline model, we evaluated the performance and efficacy of this approach and applied machine learning models to (1) patient clinical characteristics, which included demographics, comorbidities, and clinical presentation; (2) patient activity with features derived from remotely collected activity metrics; and (3) patient clinical characteristics + patient activity with features obtained or derived from both clinical records and activity metrics.
Performance comparison of machine learning models trained with different data sources.
|
|
Metricsb | ||||
Parametera | Model | AUROCc curve | Sensitivity | Specificity | Precision | F1 score |
ACS-NSQIP SRCd |
|
0.6333 | 0.9000 | 0.0370 | 0.4091 | 0.5625 |
Patient clinical characteristics | LRe | 0.7054 | 0.9000 | 0.2321 | 0.4558 | 0.6051 |
Patient activity | SVMf | 0.7027 | 0.9000 | 0.2107 | 0.4491 | 0.5992 |
Patient clinical characteristics + patient activity | GBTg | 0.7875 | 0.9000 | 0.3929 | 0.5143 | 0.6545 |
aParameters used for the models are summarized in
bThe metrics for the machine learning models represent the average across all leave-one-subject-out cross-validation folds.
cAUROC: area under the receiver operating characteristic.
dAmerican College of Surgeons National Surgical Quality Improvement Program surgical risk calculator (ACS-NSQIP SRC) was used as the baseline model for complications from pancreatoduodenectomy.
eLR: logistic regression.
fSVM: support vector machine.
gGBT: gradient boosted trees.
In our analysis, we observed that 15 out of 20 features with the highest impact discovered by SHAP were from the best performing GBT model trained on patient clinical characteristics + patient activity (see
Finally, to determine if the amount of missing data affected the performance of the classification model, the average number of days with high data availability (again, defined as days with a yield greater than or equal to 50%) for correctly classified patients was compared with that for incorrectly classified patients. The difference in the average number of days with high data availability between correctly classified patients and incorrectly classified patients was statistically insignificant (17 days, SD 10 days, versus 25 days, SD 25 days, respectively;
Analysis of variance test statistics on the features extracted from Fitbit Inspire HR (Fitbit, Inc) data.
Featuresa | Patients with complications, mean (SD) | Patients with textbook outcomes, mean (SD) |
|
SHAPb value | ||
|
|
|
|
|
|
|
|
Variance of local homogeneity | 6744.5286 (5055.2469) | 13362.2921 (7545.2961) | 11.1603 | .002 | 1.2694 |
|
Mean of correlation | 31.9993 (0.0007) | 31.9996 (0.0004) | 2.5324 | .12 | 0.2338 |
|
Mean DFAc of heart rate with 40-minute window | 22.7418 (5.3550) | 24.8816 (5.0493) | 1.9086 | .17 | 0.2214 |
|
Mean of energy | 202.1648 (192.6207) | 140.9836 (71.2032) | 2.2724 | .14 | 0.2064 |
|
Mean of skewness | 1.3182 (0.4978) | 1.1065 (0.4253) | 2.4006 | .13 | 0.1787 |
|
Cosinor amplitude | 6.2318 (3.3540) | 7.3569 (3.6230) | 1.1464 | .29 | 0.1507 |
|
Variance of correlation | 3.3737e–7 (8.8545e–7) | 9.9500e–7 (2.0791e–7) | 1.7977 | .19 | 0.1500 |
|
Log Cosinor amplitude | 2.2616 (0.6844) | 2.4344 (0.6922) | 0.7041 | .41 | 0.1119 |
|
Mean of kurtosis | 6.2530 (2.2063) | 5.6795 (2.4526) | 0.6640 | .42 | 0.0558 |
|
Variance DFA of heart rate with 30-minute window | 12.1549 (7.9180) | 17.6321 (11.8530) | 3.1035 | .08 | 0.0476 |
|
|
|
|
|
|
|
|
Variance of daily sedentary bout | 0.4669 (0.2638) | 0.5574 (0.3587) | 0.8798 | .35 | 0.2174 |
|
Mean of intradaily stability | 0.1100 (0.0808) | 0.0689 (0.0368) | 5.3752 | .02 | 0.0930 |
|
Relative amplitude | 0.2948 (0.1653) | 0.2097 (0.0878) | 5.0969 | .03 | 0.0662 |
|
Intradaily stability with 60-minute window | 0.1341 (0.1034) | 0.0788 (0.0559) | 5.4469 | .02 | 0.0428 |
|
|
|
|
|
|
|
|
Mean DFA of sleep stages with 50-minute window | 2.8834 (0.3767) | 2.9634 (0.2589) | 0.7294 | .40 | 0.0471 |
|
|
|
|
|
|
|
|
Neutrophils | 50.8000 (27.5481) | 31.5393 (30.4855) | 4.8323 | .03 | 0.9024 |
|
Prior surgery | 0.7500 (0.4330) | 0.3571 (0.4792) | 8.1374 | .007 | 0.3428 |
|
Calcium | 9.2450 (0.4955) | 9.6071 (0.6464) | 4.2378 | .05 | 0.2932 |
|
ASAd class | 2.6500 (0.4770) | 2.2857 (0.5249) | 5.8069 | .02 | 0.1522 |
|
Hyperlipidemia | 0.6000 (0.4899) | 0.3571 (0.4792) | 2.8189 | .10 | 0.0419 |
aStatistically significant features (
bSHAP: SHapley Additive exPlanations.
cDFA: detrended fluctuation analysis.
dASA: American Society of Anesthesiologists.
Preoperative clinical evaluation and assessment for surgical candidacy plays an essential role in postoperative outcomes. Patients who are more physically fit for surgery are less likely to experience complications. To better predict which patients will have poor outcomes, several tools have been developed and implemented over the years, including physical examination, patient demographics, laboratory values, and risk calculators; however, none of these are perfect. In this study, we used wearable telemonitoring technology in conjunction with machine learning to evaluate patient activity preoperatively and assess its ability to predict surgical outcomes.
Our models included patient clinical characteristics, patient activity, and patient clinical characteristics combined with patient activity, which we then compared with predictions from the ACS-NSQIP SRC. We found that all three of our machine learning models outperformed the baseline estimations from the ACS-NSQIP SRC. As shown in the results section, the ACS-NSQIP SRC had an AUROC curve of 0.6333 for predicting a textbook outcome after pancreatectomy, which is consistent with previous reported findings of AUROC curves in national samples [
Within the machine learning model, we utilized SHAP scores to identify features with the greatest impact. Specifically, within heart rate features, the “variance of local homogeneity” in heart rate was significantly correlated with higher SHAP values. This suggests that particular attention should be paid to patients’ physiological status prior to surgery. Additionally, the “mean of intradaily stability” and “relative amplitude” of steps taken [
Physical activity is a targetable and modifiable behavior that has been shown to improve outcomes of cancer patients undergoing chemoradiation [
Based on our early results, we think that the combination of patient activity metrics collected preoperatively using wearable devices and machine learning models has the potential to reliably predict operative risks. In addition, by objectively tracking activity metrics and identifying areas of weakness, the data will provide targets for preoperative optimization and allow surgeons to more efficiently engage patients in their surgical care even before they undergo a major procedure. The ultimate goal is to decrease the likelihood of postoperative complications, which we believe will have a particularly large impact on patients with pancreatic cancer, a growing population with a high proportion of elderly and frail patients.
The study was limited by a small sample size, which could potentially increase the risk of overfitting. However, as discussed in the methods section, multiple precautions were taken to reduce the effect of overfitting. We also acknowledge the risk for selection bias, as we recruited patients with access to a smartphone, which has the potential to exclude elderly patients and patients from lower socioeconomic groups.
Machine learning models based on preliminary data outperform standard ACS-NSQIP SRC estimates when used to predict a textbook outcome after pancreatectomy. The highest performance at this task was observed when machine learning models incorporated patient clinical characteristics and activity metrics collected with wearable telemonitoring technology. In the future, this can provide physicians with real-time actionable data that can be used to modify management of patients undergoing pancreatectomy and develop interventions to increase patient activity.
Parameters used for feature extraction, imputation, and models.
American College of Surgeons National Surgical Quality Improvement Program surgical risk calculator
analysis of variance
area under the receiver operating characteristic
detrended fluctuation analysis
gradient boosted trees
institutional review board
k-nearest neighbors
logistic regression
leave-one-subject-out cross-validation
Modified Accordion Grading System
midline estimating statistic of rhythm
SHapley Additive exPlanations
support vector machine
This work was supported by grants from The Foundation for Barnes Jewish Hospital and the BJC Health Systems Innovation Lab. GW is supported by the SPORE Grant 5P50 CA196510. REDCap is supported by Clinical and Translational Science Award (CTSA) Grant UL1 TR000448 and Siteman Comprehensive Cancer Center and NCI Cancer Center Support Grant P30 CA091842.
Authors HC and DL contributed equally as co-first authors. Authors CH and CL are co-corresponding authors.
None declared.