This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Commercial activity trackers are growing in popularity among adults and some are beginning to be marketed to children. There is, however, a paucity of independent research examining the validity of these devices to detect physical activity of different intensity levels.
The purpose of this study was to determine the validity of the output from 3 commercial youth-oriented activity trackers in 3 phases: (1) orbital shaker, (2) structured indoor activities, and (3) 4 days of free-living activity.
Four units of each activity tracker (Movband [MB], Sqord [SQ], and Zamzee [ZZ]) were tested in an orbital shaker for 5-minutes at three frequencies (1.3, 1.9, and 2.5 Hz). Participants for Phase 2 (N=14) and Phase 3 (N=16) were 6-12 year old children (50% male). For Phase 2, participants completed 9 structured activities while wearing each tracker, the ActiGraph GT3X+ (AG) research accelerometer, and a portable indirect calorimetry system to assess energy expenditure (EE). For Phase 3, participants wore all 4 devices for 4 consecutive days. Correlation coefficients, linear models, and non-parametric statistics evaluated the criterion and construct validity of the activity tracker output.
Output from all devices was significantly associated with oscillation frequency (
Across study phases, the SQ demonstrated stronger validity than the MB and ZZ. The validity of youth-oriented activity trackers may directly impact their effectiveness as behavior modification tools, demonstrating a need for more research on such devices.
The 2008 Physical Activity Guidelines for Americans recommend that children and adolescents engage in at least 60 minutes of physical activity (PA) daily [
Exploring novel resources and tools for promoting youth PA are needed. In adults, pedometers and other commercially available PA tracking devices (eg, FitBit, JawBone UP, Misfit Shine) have gained considerable popularity. Similar activity trackers are now being marketed for children. Several studies have now evaluated validity and reliability of adult activity trackers [
Activity trackers are promoted as behavior change tools to increase PA, similar to the way pedometers are used in walking-based intervention studies or community-based programs [
Before being used for intervention purposes, the validity of these commercially available activity trackers needs to be established so that researchers using these devices can be confident in their utility as behavioral tracking devices. In addition, evaluating these youth-oriented activity trackers may help consumers make educated decisions regarding device selection. Therefore, the purpose of this study was to determine the ability of youth-oriented activity trackers to detect the volume of movement in 3 phases, from highly structured to unstructured: (1) using an orbital oscillator, (2) during structured indoor activities, and (3) during 4 days of free-living activity.
The ActiGraph GT3X+ (AG; ActiGraph, LLC, Pensacola, FL) accelerometer provides an objective estimate of human PA and is used in many research and clinical applications [
This study evaluated the output from 3 commercially available activity trackers (see
Descriptions and features of the ActiGraph accelerometer, and the Movband, Sqord, and Zamzee activity trackers.
Device | Placement | Output | Features |
ActiGraph GT3X+ |
Above right hip; elastic belt | Vector Magnitude Counts from raw (30-Hz) tri-axial acceleration signal | No display; no user interface |
Movband Model 2 |
Dominant wrist | “Moves” based on tri-axial accelerometer and proprietary algorithm | Display screen; upload to website for tracking and group participation |
Sqord |
Dominant wrist | Activity “Points” based on tri-axial accelerometer and proprietary algorithm | No display; upload to website (tracking, groups/social network, avatar, challenges) |
Zamzee |
Above right hip; elastic belt | Activity “Pointz” based on tri-axial accelerometer and proprietary algorithm | No display; upload to website (tracking, groups/social network, avatar, challenges) |
Devices used (from left to right): ActiGraph GT3X+™, MovBand Model 2™, Sqord™, and the Zamzee™.
In Phase 1, criterion and construct validity were assessed by comparing activity tracker output to the oscillation frequency and to the AG output respectively. Using the orbital shaker produces a known and constant oscillation frequency, which should be detected by all activity trackers due to the uniformity of the movement. If the devices are not able to detect changes in such structured movement, it would be ill-advised to assume they would work in free-living settings. Therefore, this validation step is included as a first “hurdle” for such devices to clear. Additionally, the highly controlled oscillations allowed for the assessment of inter-unit reliability for all devices.
An orbital shaker was used to perform electronic motion testing, which produces controlled oscillations between 0.25 and 5.00 Hz. Four trays were mounted on the base of the oscillating plate of the shaker. Each tray had 4 slots to securely position the activity trackers (one of each brand) and the AG (ie, 4 of each device were tested at the same time). All devices were spun continuously for five minutes at three frequencies (1.3, 1.9, and 2.5 Hz) that were previously used to approximate light, moderate, and vigorous intensity physical activity [
Following each 5-minute oscillation frequency, total output for the MB was retrieved from the real-time display and output for both the SQ and ZZ were downloaded and retrieved from their websites. ActiGraph data were collected at 80 Hz, without the low frequency extension, post processed using ActiLife software (version 6.1) and aggregated into counts per second. Since the activity trackers were only able to provide output for the full duration of each activity, an analogous variable was obtained from the AG. Vector magnitude counts from the AG (counts per second output from all 3 axes) were summed over 5 min for each oscillation frequency.
Spearman correlation coefficients (ρ) were used to assess associations between oscillation frequency and device output (criterion validity) and Wilcoxon Rank Sum tests were used to determine whether there were significant differences in mean output from each device among the 3 oscillation frequencies. In addition, inter-unit reliability was assessed with linear-mixed models to calculate the coefficient of variation in output at each oscillation frequency.
In Phase 2, criterion and construct validity were assessed during structured indoor activities by comparing activity tracker output to energy expenditure using a portable indirect calorimetry system and to the AG output.
Children (6 to 12 years old) from the local community were recruited to participate in the validation of the activity trackers during structured activities (see
Descriptive information for participants in phase 2 and 3.
Participant characteristics | Phase 2: structured activities |
Phase 3: free living activity (n=16) |
Age, mean (SD) | 9.0 (2.0) | 8.6 (1.6) |
Gender (female), n (%) | 7 (50) | 8 (50) |
Height (cm), mean (SD) | 135.3 (13.57) | 133.9 (12.2) |
Weight (kg), mean (SD) | 36.1 (10.6) | 32.2 (8.4) |
In addition to the 3 activity trackers (MB, SQ, ZZ) and the AG, energy expenditure (oxygen consumption [VO2] expressed as ml/kg/min) was measured with the Oxycon Mobile portable indirect calorimetry system (Carefusion, Inc.); serving as a criterion measure during the structured activities. The Oxycon Mobile provides breath-by-breath analysis of gas exchange and has been validated for use in children using pediatric-sized face masks and harness [
The participants reported to the laboratory on two occasions. The child’s height and weight were measured by trained research staff to the nearest 0.1 cm and 0.1 kg, respectively, using a portable stadiometer (Weigh and Measure, LLC, Olney MD) and digital scale (Seca 876, Hanover, MD). We also obtained age (from date of birth), sex, and grade level at school. Demographic data (age, sex, height, and weight) were entered into each device website as a new user account for each participant. Once in the gymnasium, participants were fitted with the Oxycon Mobile system and one set of activity trackers. The AG and ZZ were placed above the right hip bone on an adjustable belt and the MB and SQ were placed on the dominant wrist as per manufacturer recommendations. The hip-worn AG is the standard placement and has been validated previously using similar activities in similar age groups [
Ten activities were performed (5 activities per visit). Each activity was performed for 7 minutes, with the exception of quiet sitting, which was performed for 5 minutes. Order effects on EE levels were minimized by allowing for a 5-10 minute break between each activity and by balancing the order of presentation for moderate and vigorous intensity activities. Participants were allowed to remove the Oxycon Mobile mask during breaks and water was provided. A description of the tasks for each visit is provided in
Description of structured activity visits.
Visit 1 | Visit 2 |
1. Sita: quiet sitting | 1. Sit: quiet sitting |
2. Catchb: Stood and played catch, minimal movement (research staff retrieved ball) | 2. Cards: sit at table and play cards |
3. Self-Paced Walk
Instructed to walk at a comfortable pace; Reminded child that there would be a slower and faster walk; Always preceded the Slow- and Moderate-paced walks; Research staff walked with and paced the participant to obtain a constant intensity of effort. |
3. Slow Walk: 0.5 miles/hr (0.22 meters/sec) slower than Self-Paced Walk |
4. Moderate and Vigorous Activities (Balanced Order)
Moderate Walk: 0.5 miles/hr (0.22 meters/sec) faster than Self-Paced Walk Self-Paced Jog: Jog at comfortable pace participant could maintain for 7 min; Research staff jogged with and paced the participant to obtain a constant intensity. |
4. Moderate and Vigorous Activities (Balanced Order)
Modified Tag: A tag game that involved one researcher playing tag with the participant with a goal to take ribbons off of the belt of the participant-researcher. Modified Relay: 20 meter distance marked on the floor with cones; Researcher demonstrated various calisthenics and movement patterns that involved moving from one cone to the other and back; Participant copied movements; Researcher and participant took turns for duration of activity. |
aSit lasted 5 min; all other activities lasted 7 min.
bCatch was subsequently classified as moderate intensity due to children frequently not catching the ball and having to walk/jog to retrieve it.
After each activity, the data from the activity trackers were uploaded to the device website. Total “Moves,” Points, and “Pointz” for the MB, SQ, and ZZ, respectively, were recorded for each activity. The AG accelerometer data were downloaded via the Actilife computer software (version 6.1). From the raw AG data, total vector magnitude counts for each activity were recorded (analogous to the output from the activity trackers). The Oxycon Mobile data were uploaded and summarized using the system’s software. The computers used to collect the AG and Oxycon Mobile data were synchronized and the first and last minute of the AG and EE data were removed and the remaining five minutes of data was summed; all five minutes of data from the Sit condition were retained.
Most of the data distributions from the activity trackers were not normally distributed. Therefore, Spearman rho coefficients were calculated to compare activity tracker output to the EE and the AG vector magnitude counts. Since the output variables from each device are different, comparing the absolute output values was not meaningful. Hence, within-device analyses were conducted to examine if the pattern of output from each activity tracker was similar to the pattern observed for EE and AG vector magnitude counts. To limit the number of analyses performed, activities were categorized as Sedentary (both sitting sessions, playing cards), Light (slow walk, self-paced walk), Moderate (catch, moderate walk) and Vigorous (self-paced jog, tag, relay). A repeated measures Wilcoxon Rank Sum non-parametric test was calculated to determine within device differences in output among the intensity categories.
In Phase 3, construct validity was assessed by comparing activity tracker output to the AG vector magnitude counts across all days and for each day.
The same recruitment strategy, inclusion criteria, and informed consent procedures used for Phase 2 were also used for this free living assessment.
The same demographic and anthropometric data that were collected for Phase 2 (Structured Activities) were obtained from the Phase 3 participants and used for setting up each device. The device placements were also the same as for Phase 2. Children were instructed to wear all four devices during all waking hours for the next 4 full days, except when the devices would get completely wet (eg, showering, bathing, swimming). Participants received verbal instructions and were sent home with a sheet of instructions for all devices and contact information for study staff in the event of technical problems. The AG was initialized to begin recording data at 4:00 AM in the morning after the children received their devices. After the 4 days of wearing the device, children returned all devices.
Using the ActiLife software [
One participant’s data were removed from the data set due to excessive non-wear time on two days and very low activity levels (more than 2 standard deviations below group average) which caused significant skewness in the data. One other participant also had high non-wear time on two days and one other child’s data were missing for one day. Both of these participants were retained for the analyses (removing them did not significantly affect the results). The retained data approximated a normal distribution across all output variables. Since each device uses a different algorithm to process its output, the metrics for each device were different and direct comparisons of absolute values (eg, Moves vs Points/Pointz vs Minutes of MVPA) were not possible. Pearson correlation coefficients were calculated to compare total daily metrics from each activity tracker to analogous metrics from the AG (ie, total vector magnitude counts per day; minutes of total activity and MVPA per day). Repeated measures linear models were calculated for each device separately to identify daily differences in device output.
The output for each device was highly correlated with oscillation frequency; correlations for the AG, MB, SQ, and ZZ were .96, .99, .98 and .92, respectively. Output from each device was significantly different among the oscillation frequencies (range
Coefficient of variation at each oscillation frequency across all units for each device.
Device | Frequency (Hz) | ||
1.3 | 1.9 | 2.5 | |
AGa | 0.28 | 0.41 | 0.43 |
MBb | 0.62 | 0.85 | 0.19 |
SQc | 29.8 | 3.85 | 1.93 |
ZZd | 25.5 | 12.1 | 9.75 |
aAG: ActiGraph.
bMB: Movband.
cSQ: Sqord.
dZZ: Zamzee.
Output as a function of oscillation frequency for individual units of the (A) ActiGraph, (B) Movband, (C) Sqord and (D) Zamzee.
Across all of the structured activities, Spearman correlation coefficients between EE and the AG, MB, SQ, and ZZ were .87, .61, .87, and .60, respectively. Associations among the activity trackers and the AG were .66, .90, and .66 for the MB, SQ, and ZZ, respectively. With activities categorized by intensity (sedentary, light, moderate, and vigorous), EE and AG counts increased in a step-wise fashion (
Box and Whisker plots for (A) Energy expenditure, and device output by activity intensity category for the (B) ActiGraph, (C) Movband™, (D) Sqord™, and (E) Zamzee™.
Pearson correlation coefficients with ActiGraph vector magnitude total counts by day for the Movband™, Sqord™, and Zamzee™ activity trackers.
Device output by day. a = Day 4 significantly different from Days 1 and 2 for ActiGraph and Zamzee™ (p <0.04). Note: device output has been scaled to better present all device data in one figure
ActiGraph minutes of MVPA and total PA per day; mean (SD).
Activity Intensity | Day | |||
1 | 2 | 3 | 4a | |
MVPA (moderate +vigorous PA) | 57.1 (18.1) | 65.5 (15.6) | 60.1 (28.7) | 48.0 (19.6) |
Total PA | 157.1 (30.2) | 183.1 (28.0) | 168.2 (59.0) | 144.4 (44.9) |
asignificantly different from Day 2,
Total vector magnitude counts summed over all days from the AG were significantly associated with the analogous output from the activity trackers (
The total number of steps across all days estimated by the MB was significantly associated with the step estimate from the AG (
The within device repeated measures analyses indicated that for AG and ZZ, the output was significantly greater on days 1 and 2, compared to day 4 (
The purpose of this study was to evaluate the validity of 3 youth-oriented commercially available activity trackers using 3 approaches: (1) orbital shaker testing, (2) human testing during structured activities, (3) four days of free-living activity. The major finding of this study was that no consumer device was consistently superior across all 3 approaches to device validation. The SQ, however, performed consistently well during the structured and free-living activities. Compared to the SQ and ZZ, the MB demonstrated the lowest inter-unit variability across all frequencies during the orbital shaker testing; the ZZ demonstrated the greatest inter-unit variability. The ZZ was also less sensitive to the higher intensities performed during the structured activities, and demonstrated lower overall and more variable day-to-day associations with the AG during the free-living phase of this study. However, the ZZ was the only activity tracker that identified lower activity on day 4, compared with days 1 and 2 which was similar to the analysis of the AG counts. With the increased popularity of consumer activity trackers, the use of these devices in research interventions is looming, although based on these findings, caution is warranted. For use by consumers or researchers, activity trackers should be able to differentiate between distinct intensity levels of effort. Performing 30 minutes of vigorous intensity activity but the tracker only registering some or none of the time as vigorous could de-motivate many individuals.
Inter-unit variability is an important instrument characteristic when considering the use of these devices in group settings and for tracking intervention progress or change in PA. Since the activity trackers were tightly secured in the orbital shaker, the variability observed is not a function of human variability in movement for a given activity. The greater inter-instrument variability observed for the ZZ will reduce the ability to identify differences between groups or group-level changes over time, requiring larger sample sizes in order for those differences to attain statistical significance. This inter-instrument variability might not be of major concern when examining intra-individual changes. However, these devices are currently being used in group settings (classrooms, physical education) that typically lead to comparisons among children. Thus, two children may perform the same activity but the devices may record different amounts of movement. The relatively high coefficient of variation for the SQ output during the lowest oscillation frequency would indicate that the SQ may not be sensitive to detecting light intensity movements; an important consideration since increasing evidence implicates excess sedentary time as an unhealthy behavior distinct from physical activity [
During the structured activities, the MB and ZZ trackers demonstrated a ceiling effect such that as the intensity of activity increased (according to indirect calorimetry and AG) the output from the trackers plateaued. This could be due to the detectable range for the accelerometer that is used in each device, the sampling and filtering of the raw acceleration signal, and/or the proprietary algorithms used to produce their output (Moves and “Pointz”, respectively). Because these issues are internal to the devices, we are only able to speculate on these issues. Alternatively, the large inter-quartile and overall ranges in the device output for each intensity category indicates substantial inter-individual variability in the output from these devices for the same activities. For example, the self-paced walking and jogging paces were maintained consistently by participants, although each participant’s pace was individualized. Therefore, the inability of the MB and ZZ to distinguish among light, moderate, and vigorous activities may be due, in part, to inter-individual variability, rather than a tracker hardware or software issue. However, the MB and the SQ were both worn on the same wrist with an intensity-related increase observed for the SQ Points but not for the MB Moves. Similarly, while the AG counts increased with activity intensity the ZZ “Pointz” did not increase correspondingly, even though both devices were worn on the hip.
These findings are in contrast to the one other validation study of youth activity trackers. Guthrie et al [
The associations between the activity tracker output and the AG during the four days of free living activity were low and inconsistent for the MB and ZZ, while relatively high and stable for the SQ. One major limitation of all the trackers was that none provided output for minutes per day spent in MVPA, which is the variable that would allow examination of attainment rates for PA guidelines [
The low and variable day-to-day correlations between the AG and the MB and ZZ trackers may be a reflection of improper device placement and/or inconsistent wear time, although the SQ demonstrated higher and more stable daily associations with the AG. Participants were provided their devices and instructed to wear all 4 consistently at the same time for all 4 days. Access to each activity tracker website was provided as a means of encouraging children to wear the devices each day. However, some participants may have taken off the MB and ZZ to synchronize with the associated website and may not have put it back on in the proper place or may have forgotten to put it back on right away. Additionally, children may not have reattached all of the devices after sleep time. Therefore, the total and pattern of wear time for each tracker and the AG may have been different. Although we provided each participant and his/her parent/guardian with a log sheet to record times when the devices were removed, these were not completed consistently, and some families did not complete it at all. While the ZZ needed to be removed and connected to a USB port to check on progress, neither the MB nor SQ needed to be removed for this task. The periodic removal of the ZZ may have led to children forgetting to put the ZZ back on, leading to wear times that were not consistent with the other devices. Indeed, this participant compliance limitation is present in all free-living studies. Therefore, using a semi-structured protocol would allow an assessment of criterion and/or construct validity by, for example, directly observing children in natural settings (eg, home, playground) while they wear the activity trackers and the AG or other research-grade accelerometer [
While the ZZ demonstrated low and variable associations with the AG, it was the only tracker that identified less activity on the fourth day of wear compared to days one and two; similar to the AG. It is likely that the larger variability in the daily values from the MB and SQ limited our ability to detect statistically significant day-to-day differences for these consumer devices. This may be an important consideration if the goal is to use these devices to track changes in activity at the group level, especially for smaller groups of children where inter-individual variability may overwhelm the ability to detect any statistically significant differences between groups or changes over time.
To our knowledge, this is only the second study to provide validation results for commercially-available youth-oriented activity trackers [
A limitation endemic to research with activity trackers is the lack of a common metric for direct comparisons among devices. This is due to the proprietary algorithms employed by the device manufacturers. Another possible limitation is the relatively small sample (n=14 and n=16) for the testing of human subjects. Despite this, we were able to observe the expected differences in device output for the ActiGraph, our criterion measure, indicating that we had adequate statistical power to detect meaningful differences. A post-hoc power analysis was performed using G*Power (version 3.1) [
In summary, all of the activity trackers distinguished between frequencies on the orbital shaker with limited inter-unit variability for the MB and SQ whereas the ZZ units displayed the greatest amount of inter-unit variability. During the structured activities, the SQ was able to distinguish between all of the activity intensity categories, similar to the criterion measures (weight relative energy expenditure and AG vector magnitude counts), while the MB and ZZ did not discriminate among light, moderate and vigorous activity intensity. During the 4 days of free-living activity, the ZZ was the only activity tracker to identify the activity level on the fourth day to be lower than the first and second day, similar to the AG. Visually, the pattern of output from the MB and SQ also resembled that from the AG, but high levels of inter-individual variability prevented the detection of those patterns as statistically significant.
Of the devices tested, the Sqord demonstrated stronger validity compared with the Movband and Zamzee across study phases. Youth activity tracker manufacturers may use this information to assist with product development and refinement. Interventionists can use this information to assess the utility of these youth-oriented activity trackers as behavior change tools. Physical activity researchers may use this information to conduct additional investigations of other youth- and adult-oriented activity trackers. Future research studies should assess the validity of youth activity trackers in larger, more diverse samples and assess the reliability of these trackers over longer periods of use, which would be important if such devices are to be used as behavioral modification tools.
ActiGraph
MovBand
Sqord
Zamzee
energy expenditure
physical activity
moderate + vigorous physical activity
This study was funded by the Reebok Foundation.
None declared.