Published on in Vol 18, No 6 (2016): Jun

Mining Health App Data to Find More and Less Successful Weight Loss Subgroups

Mining Health App Data to Find More and Less Successful Weight Loss Subgroups

Mining Health App Data to Find More and Less Successful Weight Loss Subgroups

Original Paper

1National Cancer Institute, Bethesda, MD, United States

2ICF International, Rockville, MD, United States

3Pennsylvania State University, State College, PA, United States

*all authors contributed equally

Corresponding Author:

Katrina J Serrano, PhD

National Cancer Institute

9609 Medical Center Dr.

Bethesda, MD, 20892

United States

Phone: 1 2402766654

Fax:1 2402767907


Background: More than half of all smartphone app downloads involve weight, diet, and exercise. If successful, these lifestyle apps may have far-reaching effects for disease prevention and health cost-savings, but few researchers have analyzed data from these apps.

Objective: The purposes of this study were to analyze data from a commercial health app (Lose It!) in order to identify successful weight loss subgroups via exploratory analyses and to verify the stability of the results.

Methods: Cross-sectional, de-identified data from Lose It! were analyzed. This dataset (n=12,427,196) was randomly split into 24 subsamples, and this study used 3 subsamples (combined n=972,687). Classification and regression tree methods were used to explore groupings of weight loss with one subsample, with descriptive analyses to examine other group characteristics. Data mining validation methods were conducted with 2 additional subsamples.

Results: In subsample 1, 14.96% of users lost 5% or more of their starting body weight. Classification and regression tree analysis identified 3 distinct subgroups: “the occasional users” had the lowest proportion (4.87%) of individuals who successfully lost weight; “the basic users” had 37.61% weight loss success; and “the power users” achieved the highest percentage of weight loss success at 72.70%. Behavioral factors delineated the subgroups, though app-related behavioral characteristics further distinguished them. Results were replicated in further analyses with separate subsamples.

Conclusions: This study demonstrates that distinct subgroups can be identified in “messy” commercial app data and the identified subgroups can be replicated in independent samples. Behavioral factors and use of custom app features characterized the subgroups. Targeting and tailoring information to particular subgroups could enhance weight loss success. Future studies should replicate data mining analyses to increase methodology rigor.

J Med Internet Res 2016;18(6):e154



Smartphone ownership among American adults has increased from 35% in 2011 to 68% in 2015 [1]. This increase has coincided with the proliferation of smartphone apps, and 19% of all app downloads are related to health, with more than half of them involving weight, diet, and exercise [2]. This provides new opportunities to deliver interventions for health behavior change and weight loss in the United States where obesity rates have remained high [3].

Although apps show great promise for helping individuals lose weight and manage lifestyle habits [4-6], evidence to support the impact of commercial apps on health behavior and weight loss is still lacking. This may be due to the lack of evidence-based weight loss principles in currently available apps [7]. But given the popularity of these apps, the potential implications are far-reaching, not only in terms of disease prevention (eg, diabetes, cardiovascular diseases, cancer) but also in cost-savings [8-11].

Data that are collected from commercial health apps are often not collected with scientific research in mind. However, these apps can reach millions of users. If analyzed with rigorous scientific methods, the potentially rich data collected from these apps may offer important insights into how behavior change occurs in naturalistic settings among large segments of the population. Exploratory analyses, such as data mining methods, that can be used to examine existing health data are not new [11-13], but they have rarely been used to examine health data collected from commercial apps.

Furthermore, scientific methods to examine the reliability and robustness of exploratory analyses (ie, data mining validation methods) have also been available for some time [14,15], but have not been used with health app data. With millions of individuals using commercial health apps, opportunities now exist for both exploratory data mining and data mining validation methods to occur in rapid succession. Data mining validation methods increase the scientific rigor of exploratory approaches by testing whether initial findings are stable.

To our knowledge, no studies have explored the effectiveness of a weight loss commercial app AND evaluated the reliability of the exploratory findings. The purposes of this study were to (1) assess the prevalence of weight loss among overweight and obese adults from data gathered by a commercial app, (2) identify successful weight loss subgroups and their characteristics using exploratory data mining techniques, and (3) examine the reliability of the identified subgroups using independent samples.


We analyzed a subset of cross-sectional, de-identified data (n=12,427,196), which were obtained directly from Lose It! (FitNow Inc., Boston, MA, USA). Data were made available to researchers at the National Cancer Institute for research purposes only. Lose It!—launched in 2008—is a weight loss app that is available through both iOS and Android app markets, as well as through the Web. Lose It! (henceforth, called the app) provides users with tracking tools (eg, barcode scanners); connections with other devices and apps (eg, Fitbit, RunKeeper); motivation and support (eg, connection with friends); and nutrition feedback (eg, system-generated reports comparing a user’s food log with the US Department of Agriculture’s MyPlate recommendations).

In the app, a user creates an account and a weight loss plan based on one’s height, weight, exercise level, target weight goal, and desired weekly weight loss. The app then uses all this information to calculate an estimated calories budget that is intended to produce the energy deficit required to meet one’s weight loss goal. The weight loss plan consists of logging one’s diet, exercise, and weight through either self-report or a synced device (eg, WiFi-connected body scales). The app offers motivation and support tools by allowing users to identify friends and share progress and information with them. Users can also participate in groups designed to motivate users; for example, one featured group—“We’re all in this together!”—is described as “a group for people looking to give motivation and people looking to get motivation.”

The data analyzed were from users who had the app during the years of 2008-2014. Data provided for analysis were from the app’s metadata reporting database, which is used to power the app and provides a general summary of user activity. Thus, the data analyzed were cross-sectional in nature. The dataset included the following information: age at setup of the account, gender, height, body weight, body mass index (BMI), desired goal weight, desired weekly weight loss, number of days logged in for food and exercise, number of exercise calories burned, number of calories consumed, number of times weighed in, number of days active, date of last activity, devices and apps connected to a user’s account, type of operating system used, number of friends and groups on the app, number of challenges users participated in, number of customized goals, foods, recipes, and exercises users entered, and app-specific options (eg, has a picture, uses reminders). Weight and health behavior data were self-reported, whereas technical-related data (eg, type of operating system used, app-specific options) were from the app’s database. More time-intensive longitudinal data for the full sample of users between 2008 and 2014 were not readily available at the time of analyses.

Data cleaning was required before analyses, which included removing any duplicate records, placing valid ranges for each variable, and distinguishing between missing versus invalid data. There were 63,641 duplicates that were deleted. These users had the exact same information for all weight, health, and technical-related variables. We were left with a total sample of 12,363,555. Analyses with this entire sample proved to be challenging and required more computing memory than typically offered by a single computer. Therefore, for computing management and efficiency, this dataset was randomly split into 24 subsamples, each with a sample size of approximately 500,000. This study used 3 subsamples and excluded the following in each subsample: (1) participants who reported being less than 18 years or greater than 70 years at setup age—older adults (65 years and older) are less likely to use health-related smartphone apps [2], so to be more conservative, we chose 70 years as the upper age range; (2) participants who reported being younger than 18 years at the date of last activity; (3) participants who were underweight and of normal weight, BMI ≤24.9; and (4) participants with weight and weight loss values that were out of range; for example, we defined minimum weight values that exceeded start weight values as out of range.

The outcome of interest was weight loss, defined for the purpose of this study as losing 5% or more of a user’s starting body weight, which has been shown to lead to beneficial health effects [16-18]. This was calculated by subtracting 5% of a user’s starting weight from a user’s minimum weight. If this number was less than or equal to zero, then weight loss was categorized as yes, all others were categorized as no. The following predictors were included in the analyses: age, gender, number of weigh ins, target weight, weekly weight loss goal, start weight, start BMI, food and exercise days logged, average food and exercise calories logged, days active on the app, age at set up of the app, type of device or app used, type of operating system used, number of friends, number of groups, number of challenges, use of reminders, customized goals, customized recipes, customized exercises, and app-specific options.

Statistical analysis

Classification and regression tree (CART) analysis was conducted in subsample 1 (hereafter, known as the training sample). CART methods have been increasingly applied to health behavior research for exploratory purposes [19-23]. CART analysis is a type of decision tree methodology, also called recursive partitioning, that is useful for constructing prediction models from data [19,20,24-26]. CART uses nonparametric statistics to identify mutually exclusive and exhaustive subgroups of individuals who share common characteristics that influence the dependent variable of interest. The CART procedure uses a preselected splitting criterion to assess all possible independent variables and chooses a variable (ie, splitting variable) that results in binary groups that are the most different with regard to the dependent variable. The splitting criterion used was the Gini index of diversity [25], which selects the split that maximizes the reduction in impurity or diversity of a node, thereby reducing the error in classification [19,25].

CART methods have several advantages over more traditional approaches, such as logistic regression. Because CART is inherently nonparametric, no assumptions are made about the underlying distribution of the data. Thus, it can handle highly skewed distributions or even extreme scores or outliers [19,20,26]. CART also has sophisticated methods for handling missing data, and missing data are considered for each variable at each split point. If data are missing at a particular split point, surrogate variables that contain similar information to the primary splitter are used [27,28]. This is also an important consideration given the missing data typically seen in commercial health app data.

The CART analysis was conducted in R (version 3.1.3), using the package rpart. The default settings for rpart were used, and these parameters have been recommended by Breiman and colleagues [25]. More details about this package are provided elsewhere [28]. We then created mutually exclusive subgroups in the training sample based on the CART results. Descriptive analyses were conducted in SAS (version 9.3, SAS Institute, Inc., Cary, NC, USA) with the training sample to determine whether additional factors were uniquely associated with the various subgroups. Due to the large sample size, we were dubious of interpreting the P values; therefore, significance was determined by the unique variance explained by the predictor variables (using R2 or Cramer’s V). As a rule of thumb, the proportion of variance accounted for by the predictor variable had to be at least 1%.

The CART model predictions identified from the training sample were then evaluated with subsample 2 (hereafter, known as data mining validation sample 1) to examine the robustness of the model. The area under the receiver-operating characteristic curve (AUC) was used to evaluate the accuracy of the classification tree with data mining validation sample 1. Further evaluation was conducted with subsample 3 (hereafter, known as data mining validation sample 2), and the AUC was also obtained with this subsample. The AUC analyses were conducted in R (version 3.1.3), using the package pROC. More details about this package are provided elsewhere [29]. The annotated code regarding these analyses can be found here: For exploratory purposes, we also applied CART methods with data mining validation sample 2. We varied the default settings for the complexity parameter (ie, a criterion that takes into account the consequences of misclassification) to 0.001 versus 0.01 and the minimum number of observations in a node to compute a split as well as the terminal node to 3000 (1% of the sample) versus the default of 20 and 7, respectively.

Analytic sample

Data cleaning and exclusion criteria applied to the 3 subsamples resulted in the following analytic samples: n=324,649 for subsample 1, n=324,063 for subsample 2, and n=323,975 for subsample 3 (data flow chart shown in Figure 1).

Figure 1. Data flow chart.
View this figure

Statistical analysis

The CART model is displayed in Figure 2. As shown in the figure, 14.96% (48,562) of the training sample successfully lost weight. The CART analysis identified 3 distinct subgroups that we labeled for descriptive purposes: “the occasional users,” “the basic users,” and “the power users.”

Although descriptive names are given for each subgroup, to more fully understand and interpret the subgroups, a set of additional characteristics were further examined. Results for the descriptive analyses that examined additional unique characteristics among the subgroups are displayed in Table 1.

Figure 2. Classification and regression tree for identifying successful weight loss subgroups with the training sample (n=324,649).
View this figure
Table 1. Additional characteristics of identified successful weight loss subgroups with the training sample (n=324,649).

The occasional usersaThe basic usersbThe power userscCramer’s VR2P
% or mean (standard deviation)% or mean (standard deviation)% or mean (standard deviation)


Age (at set up of account)34.5 (12.0)35.4 (11.3)39.0 (12.1)

Start weight212.0 (50.8)211.3 (47.3)211.2 (47.4)

Start BMId33.9 (7.0)33.6 (6.6)33.0 (6.4)

Days active on the app23.5 (46.0)21.9 (10.4)168.3 (174.7)
Health behaviors

Exercise days logged9.8 (29.0)9.0 (7.7)80.5 (112.7)

Exercise calories logged39081969.7 (2931936775.4)3799.2 (4979.1)7753953.9 (1242356057.6)

Food calories logged7844318.9 (884021907.5)1040215.2 (100758647.6)11818596.1 (1075871163.0)


Goal weight160.5 (33.4)161.8 (32.4)166.2 (32.7)

Goal plane1.7 (0.4)1.8 (0.4)1.6 (0.5)
App behaviors

iPhone users (% yes)f71.59%72.84%77.73%0.0653

Android users (% yes)f29.40%31.60%30.88%0.0171

Web users (% yes)f4.01%3.84%2.94%0.0277

One or more devices/apps linked with app (eg, Fitbit) (% yes)3.70%7.82%14.00%0.1487

Has friends on the app (% yes)18.01%27.37%43.44%0.2356

Number of friends on the app0.3 (1.3)0.6 (2.2)2.1 (14.4)

Is part of a group on the app (% yes)1.41%3.21%5.45%0.0894

Number of groups on the app0.0 (0.3)0.1 (0.4)0.1 (1.1)

Has been an administrator of a challenge (% yes)0.02%0.05%0.32%0.0332

Number of challenges participated in0.0 (0.0)0.0 (0.0)0.0 (0.2)

Number of customized goals entered0.0 (0.4)0.1 (0.7)0.3 (1.4)

Number of customized foods entered5.9 (16.6)7.7 (13.3)43.9 (81.4)

Number of customized recipes entered0.4 (2.2)0.5 (2.0)4.3 (13.4)

Number of customized exercises entered0.5 (8.0)0.6 (2.7)3.1 (19.9)

Uses app reminders (% yes)5.97%8.30%14.23%0.1189

Has a picture (% yes)9.60%15.54%25.70%0.1780

Uses email reports (% yes)1.45%2.88%6.22%0.1048

a4.87% achieved weight loss success (n=12,796).

b37.61% achieved weight loss success (n=9,850).

c72.70% achieved weight loss success (n=25,916).

dBMI, body mass index.

eDesired weekly weight loss (0-2 lbs).

fUsers can download and access the app on multiple platforms and devices.

The occasional users achieved the lowest percentage of weight loss success (4.87%), and these users weighed in on the app <6.5 times. Approximately 37.61% of the basic users achieved at least 5% weight loss, and these individuals weighed in at least 6.5 times and logged in food <40 days. The power users had the highest percentage of weight loss success (72.70%) and consisted of individuals who weighed in at least 6.5 times and logged in food ≥40 days.

Compared with the other subgroups identified, the power users had more men (36.47%) than the occasional users or the basic users, and they were more active with the app (about 168 days). They also logged in more days of exercise. The majority (77.73%) of the power users used an iPhone versus Android, and a lower percentage were Web users as compared with the occasional or basic users. A higher proportion also (14.00%) had at least one or more devices/apps linked to the app versus the occasional users (3.70%) or the basic users (7.82%). The power users also had more friends on the app; were part of a group; had been an administrator of a challenge; and had more customized goals, foods, recipes, and exercises than the other subgroups. They also had a higher percentage of app customization (eg, app reminders, setting up a picture).

With respect to the robustness of the exploratory analyses, the AUC obtained from data mining validation sample 1 was 0.8327 (95% CI, 0.8306-0.8348), indicating good accuracy. The AUC obtained from data mining validation sample 2 was 0.8339 (95% CI, 0.8318-0.8359); thus, indicating high reliability. The CART model using data mining validation sample 2 is shown in Figure 3.

The factors used to predict the initial splits were almost identical to the model obtained from the training sample. Varying the complexity parameter in data mining validation sample 2 further subdivided the weight loss subgroups, based on food calories logged and weigh-ins. The overall model, however, is comparable to the initial model that used the training sample.

Based on the results that characterized the subgroups identified in the CART analyses, customization of the app appeared to be important among those who were more successful at losing weight. The group with a higher proportion of weight loss (the power users) used more features of the app than the other 2 weight loss subgroups. To explore the extent to which customization led to higher weight loss success, we conducted a logistic regression analysis post hoc using the training sample, with weight loss as the outcome and customization as the predictor. Weight loss was treated the same way as aforementioned, a dichotomous variable representing 5% or more of user’s starting weight, and customization was derived as an ordinal variable consisting of 5 values (0-4 or more) that represented the number of customization features a user had (ie, whether a user had friends; was part of a group; administered a challenge; had custom goals, exercises, foods, or recipes; used reminders; used email reports; or had a picture). The odds of weight loss success progressively increased with more customization features compared with no customization features (1 customization feature: odds ratio, OR=5.27, 95% CI=5.11-5.44; 2 customization features: OR=12.39, 95% CI=11.99-12.81; 3 customization features: OR=22.42, 95% CI=21.56-23.31; 4 customization features: OR=48.30, 95% CI=46.23-50.46). Similar results were obtained with data mining validation sample 1.

Figure 3. Classification and regression tree for identifying successful weight loss subgroups with data mining validation sample 2 (n=323,975), varying the complexity parameter, minimum node split, and terminal node. Note: Factors for initial splits are similar to Figure 2. Subgroups from similar splits are bolded.
View this figure

Commercial weight loss apps can reach large segments of society, and data from these apps can provide possible clues to subgroups that are more or less successful at losing weight. However, these data can be messy and few researchers have attempted to systematically detect the signal from the noise with this type of data, using exploratory data mining methods. In addition to providing a model for exploring large quantities of commercially generated mobile health data, this study used analytic techniques to systematically examine the robustness and reliability of results obtained from exploratory analyses.

Results indicated key behavioral factors (eg, the number of times a user weighs in and the number of food days a user logs on the app) classified subgroups with varying proportions of weight loss success. On further exploration of characteristics of weight loss, users who were more successful at weight loss logged in about 8 times more days of exercise than the other subgroups. These findings are consistent with the literature demonstrating frequent self-monitoring, such as weighing in and logging in food and exercise, is associated with greater weight loss and decreased risk of weight regain [30-34].

Unexpectedly, this study found that the most successful weight loss subgroup (the power users) had a significantly higher number of iPhone users, compared with Android or Web users. Whether this is due to differences in iPhone versus other users or differences in the user experience of the app is unclear. Moreover, having friends on the app appears to be an important characteristic of weight loss, accounting for about 24% of the variance between subgroups. The power users had about 25% more friends on the app than the occasional users. Studies have shown that social networks have become commonplace for individuals wanting to share information and seeking emotional support for issues regarding weight loss [35,36], and this is highly correlated with weight loss [37-40].

This study further suggests that greater customization of the app is associated with more likelihood of successful weight loss. Thus, although key behavioral factors are important in identifying more versus less successful weight loss subgroups, how users interact with the app may also be important. It may be possible that individuals who customize their app tend to be more engaged with their app, and those who are more engaged are more likely to be more motivated. This hypothesis warrants further investigation.


There were a number of limitations associated with this study. First, the sample may not be representative of a national population. To examine this, we compared our entire app sample with a nationally representative sample (ie, 2008-2014 National Health Information Survey, NHIS, data) to examine differences. When we restricted both samples to include those aged only 18-70 years old (the app: n=10,444,981; NHIS: n=186,134 with replicate weights), we found that the app sample had a higher percentage of women (75.40%) than the NHIS sample (50.96%). The app sample was slightly younger (35.5 years) than the NHIS sample (42.6 years). When we applied both age and weight exclusion criteria to include only overweight and obese adults, these differences persisted, although the average BMIs were comparable between the 2 samples.

Second, the weight data were self-reported which may lead to inaccurate data. We examined the BMI values in the app sample with the NHIS sample where BMI is also calculated using self-reported data. The NHIS sample had a lower average BMI, 27.8, compared with the app sample where the average starting BMI was 30.4. When we examined only overweight and obese adults, the app sample had only slightly higher starting BMI values than the NHIS sample (32.8 vs 31.0). Still, whether the results from this study can generalize to overweight and obese individuals more broadly is unknown.

Third, the data we analyzed were metadata and summary data. Therefore, we could only assess changes in weight at a general level, but not more specific longitudinal patterns. Thus, we could not assess more time-intensive longitudinal patterns of weight loss.


This study provides an approach to apply scientific methods to large health datasets collected by commercial apps and other health behavior technologies. Using both exploratory data mining and validation methods with big data in rapid fashion can increase confidence in the results that are obtained. Researchers should look to optimize scientific rigor, especially when trying to detect signal from noise in messy datasets.

In addition, the identification of particular subgroups that are successful at weight loss may help to inform researchers and practitioners involved in designing interventions with mobile technologies and smartphone apps. For example, weight loss interventions that use mobile technologies might aim to design interventions that emphasize behavioral factors and encourage individuals to customize their app experience. Furthermore, this study used data mining techniques that aid in hypothesis generation. Future studies should test the mechanisms underlying the behavior change, in this case, weight loss.

As more and more health app data become available, methods to analyze such big data will be crucial. Indeed, the era of big data offers new opportunities to better understand health behavior and behavior change, as well as potentially advance health behavior theories that help to explain mechanisms of behavior change. Our study provides an example for researchers to take full advantage of such opportunities.


The authors thank FitNow Inc., the makers of Lose It! for providing this de-identified dataset for analysis.

Conflicts of Interest

None declared.

  1. Anderson M. Pew Research Center. 2015. Technology Device Ownership: 2015   URL: [accessed 2016-04-03] [WebCite Cache]
  2. Fox S, Duggan M. Pew Research Center. Mobile Health 2012   URL: [accessed 2015-12-21] [WebCite Cache]
  3. Ogden CL, Carroll MD, Kit BK, Flegal KM. Prevalence of childhood and adult obesity in the United States, 2011-2012. JAMA 2014 Feb 26;311(8):806-814. [CrossRef] [Medline]
  4. Carter MC, Burley VJ, Nykjaer C, Cade JE. Adherence to a smartphone application for weight loss compared to website and paper diary: pilot randomized controlled trial. J Med Internet Res 2013;15(4):e32 [FREE Full text] [CrossRef] [Medline]
  5. Stephens J, Allen J. Mobile phone interventions to increase physical activity and reduce weight: a systematic review. J Cardiovasc Nurs 2013;28(4):320-329 [FREE Full text] [CrossRef] [Medline]
  6. Stephens J, Allen JK, Dennison Himmelfarb Cheryl R. “Smart” coaching to promote physical activity, diet change, and cardiovascular health. J Cardiovasc Nurs 2011;26(4):282-284 [FREE Full text] [CrossRef] [Medline]
  7. Pagoto S, Schneider K, Jojic M, DeBiasse M, Mann D. Evidence-based strategies in weight-loss mobile apps. Am J Prev Med 2013 Nov;45(5):576-582. [CrossRef] [Medline]
  8. Hammond RA, Levine R. The economic impact of obesity in the United States. Diabetes Metab Syndr Obes 2010;3:285-295 [FREE Full text] [CrossRef] [Medline]
  9. Wang YC, McPherson K, Marsh T, Gortmaker SL, Brown M. Health and economic burden of the projected obesity trends in the USA and the UK. The Lancet 2011 Aug;378(9793):815-825. [CrossRef]
  10. Withrow D, Alter D. The economic burden of obesity worldwide: a systematic review of the direct costs of obesity. Obes Rev 2011 Feb;12(2):131-141. [CrossRef] [Medline]
  11. Behrens J. Principles and procedures of exploratory data analysis. Psychological Methods 1997;2(2):131-160. [CrossRef]
  12. Greenhouse J. Exploratory statistical methods, with applications to psychiatric research. Psychoneuroendocrinology 1992 Oct;17(5):423-441. [CrossRef]
  13. Tukey J. Exploratory Data Analysis. Reading, MA: Addison-Wesley; 1977.
  14. Raubertas RF, Rodewald LE, Humiston SG, Szilagyi PG. ROC curves for classification trees. Med Decis Making 1994;14(2):169-174. [Medline]
  15. Bradley A. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 1997 Jul;30(7):1145-1159. [CrossRef]
  16. Centers for Disease Control and Prevention. Losing Weight   URL: [accessed 2015-12-21] [WebCite Cache]
  17. Mertens I, Van Gaal L F. Overweight, obesity, and blood pressure: the effects of modest weight reduction. Obes Res 2000 May;8(3):270-278 [FREE Full text] [CrossRef] [Medline]
  18. Look AHEAD Research Group, Wing RR. Long-term effects of a lifestyle intervention on weight and cardiovascular risk factors in individuals with type 2 diabetes mellitus: four-year results of the Look AHEAD trial. Arch Intern Med 2010 Sep 27;170(17):1566-1575 [FREE Full text] [CrossRef] [Medline]
  19. Lemon S, Roy J, Clark M, Friedmann P, Rakowski W. Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med 2003 Dec;26(3):172-181. [Medline]
  20. McArdle J. Exploratory data mining using CART in the behavioral sciences. In: Cooper H, Camic P, Long D, Panter A, Rindskopf D, Sher K, editors. APA Handbook of Research Methods in Psychology. Washington, DC: American Psychological Association; 2012:405-421.
  21. Atienza AA, Yaroch AL, Mãsse LC, Moser RP, Hesse BW, King AC. Identifying sedentary subgroups: the National Cancer Institute's Health Information National Trends Survey. Am J Prev Med 2006 Nov;31(5):383-390 [FREE Full text] [CrossRef] [Medline]
  22. Dunton GF, Atienza AA, Tscherne J, Rodriguez D. Identifying combinations of risk and protective factors predicting physical activity change in high school students. Pediatr Exerc Sci 2011 Feb;23(1):106-121. [Medline]
  23. King A, Goldberg J, Salmon J, Owen N, Dunstan D, Weber D, et al. Identifying subgroups of U.S. adults at risk for prolonged television viewing to inform program development. Am J Prev Med 2010 Jan;38(1):17-26. [CrossRef] [Medline]
  24. Loh W. Classification and regression trees. WIREs Data Mining Knowl Discov 2011 Jan 06;1(1):14-23. [CrossRef]
  25. Breiman L. Classification and regression trees. New York: Chapman & Hall; 1984.
  26. Kraemer H. Evaluating medical tests: objective and quantitative guidelines. Newbury Park, CA: Sage Publications; 1992.
  27. Lewis R. An Introduction to Classification and Regression Tree (CART) Analysis. 2000 Presented at: Annual Meeting of the Society for Academic Emergency Medicine; 2000; San Francisco, CA   URL: [WebCite Cache]
  28. Therneau T, Atkinson E. Mayo Foundation. 2015 Jun. An Introduction to Recursive Partitioning Using the RPART Routines   URL: [WebCite Cache]
  29. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F. R-project. Package 'pROC'   URL: [accessed 2015-12-21] [WebCite Cache]
  30. Wharton C, Johnston C, Cunningham B, Sterner D. Dietary self-monitoring, but not dietary quality, improves with use of smartphone app technology in an 8-week weight loss trial. J Nutr Educ Behav 2014;46(5):440-444. [CrossRef] [Medline]
  31. Burke L, Conroy M, Sereika S, Elci O, Styn M, Acharya S, et al. The effect of electronic self-monitoring on weight loss and dietary intake: a randomized behavioral weight loss trial. Obesity (Silver Spring) 2011 Feb;19(2):338-344 [FREE Full text] [CrossRef] [Medline]
  32. Burke L, Wang J, Sevick M. Self-monitoring in weight loss: a systematic review of the literature. J Am Diet Assoc 2011 Jan;111(1):92-102 [FREE Full text] [CrossRef] [Medline]
  33. Linde J, Jeffery R, French S, Pronk N, Boyle R. Self-weighing in weight gain prevention and weight loss trials. Ann Behav Med 2005 Dec;30(3):210-216. [CrossRef] [Medline]
  34. Wing RR, Tate DF, Gorin AA, Raynor HA, Fava JL. A self-regulation program for maintenance of weight loss. N Engl J Med 2006 Oct 12;355(15):1563-1571. [CrossRef] [Medline]
  35. Ballantine P, Stephenson R. Help me, I'm fat! Social support in online weight loss networks. J. Consumer Behav 2011 Dec 23;10(6):332-337. [CrossRef]
  36. Li V, McDonald D, Eikey E, Sweeney J, Escajeda J, Dubey G. Losing It Online: Characterizing Participation in an Online Weight Loss Community. USA; 2014 Presented at: GROUP '14 Proceedings of the 18th International ACM Conference on Supporting Group Work; 2014; Sanibel, FL p. 35-45. [CrossRef]
  37. Hwang KO, Farheen K, Johnson CW, Thomas EJ, Barnes AS, Bernstam EV. Quality of weight loss advice on internet forums. Am J Med 2007 Jul;120(7):604-609 [FREE Full text] [CrossRef] [Medline]
  38. Hwang KO, Ning J, Trickey AW, Sciamanna CN. Website usage and weight loss in a free commercial online weight loss program: retrospective cohort study. J Med Internet Res 2013;15(1):e11 [FREE Full text] [CrossRef] [Medline]
  39. Neve M, Morgan PJ, Jones PR, Collins CE. Effectiveness of web-based interventions in achieving weight loss and weight loss maintenance in overweight and obese adults: a systematic review with meta-analysis. Obes Rev 2010 Apr;11(4):306-321. [CrossRef] [Medline]
  40. Weinstein PK. A review of weight loss programs delivered via the Internet. J Cardiovasc Nurs 2006;21(4):251-8; quiz 259. [Medline]

BMI: body mass index
CART: Classification and regression tree
NHIS: National Health Information Survey

Edited by A Moorhead; submitted 21.12.15; peer-reviewed by E Dzubur, J Wang, Y Ma, D Steinberg; comments to author 21.02.16; revised version received 09.04.16; accepted 30.04.16; published 14.06.16


©Katrina J. Serrano, Mandi Yu, Kisha I. Coa, Linda M. Collins, Audie A. Atienza. Originally published in the Journal of Medical Internet Research (, 14.06.2016.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.