This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Healthy eating interventions that use behavior change techniques such as self-monitoring and feedback have been associated with stronger effects. Mobile apps can make dietary self-monitoring easy with photography and potentially reach huge populations.
The aim of the study was to assess the factors related to sustained use of a free mobile app (“The Eatery”) that promotes healthy eating through photographic dietary self-monitoring and peer feedback.
A retrospective analysis was conducted on the sample of 189,770 people who had downloaded the app and used it at least once between October 2011 and April 2012. Adherence was defined based on frequency and duration of self-monitoring. People who had taken more than one picture were classified as “Users” and people with one or no pictures as “Dropouts”. Users who had taken at least 10 pictures and used the app for at least one week were classified as “Actives”, Users with 2-9 pictures as “Semi-actives”, and Dropouts with one picture as “Non-actives”. The associations between adherence, registration time, dietary preferences, and peer feedback were examined. Changes in healthiness ratings over time were analyzed among Actives.
Overall adherence was low—only 2.58% (4895/189,770) used the app actively. The day of week and time of day the app was initially used was associated with adherence, where 20.28% (5237/25,820) of Users had started using the app during the daytime on weekdays, in comparison to 15.34% (24,718/161,113) of Dropouts. Users with strict diets were more likely to be Active (14.31%, 900/6291) than those who had not defined any diet (3.99%, 742/18,590), said they ate everything (9.47%, 3040/32,090), or reported some other diet (11.85%, 213/1798) (χ2
3=826.6,
Most people who tried out this free mobile app for dietary self-monitoring did not continue using it actively and those who did may already have been healthy eaters. Hence, the societal impact of such apps may remain small if they fail to reach those who would be most in need of dietary changes. Incorporating additional self-regulation techniques such as goal-setting and intention formation into the app could potentially increase user engagement and promote sustained use.
Despite various efforts to curb the growth of obesity, a significant part of the population still eats unhealthy food in excessive quantities. Knowledge about healthy eating is not sufficient on its own to change eating behavior [
Dietary self-monitoring prompts people to reflect on their current behavior and compare it to ideal behavior [
Traditional methods for dietary self-monitoring include more or less detailed food diaries and calorie counting [
Feedback on performance is a self-regulation technique that either reinforces the current behavior or creates a discrepancy between current and ideal behavior [
Mobile apps can provide automated feedback on the healthiness of the food based on the photo and also leverage other users to provide feedback through crowdsourcing [
Numerous mobile apps for healthy eating are available in application markets. Although they are easily within anyone’s reach, attrition is likely to be a significant challenge, since there is usually very little external pressure or incentive to continue usage [
This study assesses the overall usage and reach of a free mobile app for healthy eating (“The Eatery”) over a period from October 2011 to April 2012. Specifically, we examine the indicators of sustained use of the app, especially focusing on the initiation of self-monitoring and the influence of peer feedback.
The Eatery was a free iPhone app developed by the company Massive Health. The app was officially launched on November 1, 2011 in Apple’s application market [
Screenshots of The Eatery app: a) rating other people’s food with fat-fit scale, b) feedback received for photographed food, c) weekly summary, and d) summary of user’s time-of-day healthiness ratings and places eaten at most.
Altogether 189,770 users downloaded and used the app, The Eatery, at least once between October 15, 2011, and April 3, 2012. During this time, they generated 429,288 pictures and 7,946,447 ratings. In May 2012, Massive Health, the developer of the app, decided to make the anonymized dataset available for research purposes upon contact. The authors obtained the dataset from Massive Health in June 2012.
Data of users, pictures, and ratings included timestamps that represented the local time of the user’s mobile phone. The timestamps that were stored when the user first used the app included time zone information for 98.51% (186,933/189,770) of the users: 68.41% (127,884/186,933) of them were from the main US time zones (UTC-8 to UTC-5) and 12.48% (23,335/186,933) were from the main European or African time zones (UTC+0 to UTC+3).
Variables related to the usage of the mobile app “The Eatery”.
Category | Variable | Description |
|
||
|
Number of pictures | Total number of pictures taken by the user |
|
Usage period | Time elapsed between the first picture and the last picture (ie, the duration of self-monitoring) |
|
Pictures per day | Average number of pictures the user took per day during the usage period |
|
Ratings given for peers | Total number of ratings the user gave for other users’ pictures |
|
||
|
Registration time | Day of week (Sun-Sat) and time of day when the user first used the app |
|
Dietary preference | The response the user gave to “How do you eat?” question during the first launch of the app. The preference categories are listed in |
|
||
|
Own healthiness rating | Healthiness rating the user gave for an own picture (0 to 1)a |
|
Picture description length | Number of characters written in the picture description |
|
||
|
Average healthiness rating | Mean peer rating given for the picture (0 to 1)a |
|
Number of ratings | Total number of peer ratings given for a picture |
|
Number of comments | Total number of comments from peers for a picture |
|
Number of likes | Total number of peers who “liked” a picture |
|
Difference to peer ratings | Difference between the user’s own healthiness rating and average healthiness rating for a picture |
aHealthiness ratings were stored as a decimal number from 0 (“fat”) to 1 (“fit”), whereas the user saw the ratings as numbers from 0 to 100, as in
Pictures that did not contain an actual image (3.13%, 13,433/429,288 were such “empty pictures”) were removed from the data, resulting in a sample of 415,855 pictures for analysis. The overall quality and content of the pictures was screened by the researchers by examining a random sample of pictures. The examination revealed that, for some users, the first picture served as a test picture (for example, they took a picture of a chair to test out the application). Further examination showed that pictures obtaining a low number of peer ratings were typically something other than food, and therefore should be removed from further analysis. By manual inspection, the threshold of a valid picture was adjusted to 10 ratings: if the first picture taken by a user had received less than 10 ratings, the picture information was excluded and the second picture was used instead. If the user had taken only one picture or the second picture had received less than 10 ratings, the user was excluded from the analyses concerning the peer feedback for the first picture. The total number of pictures for each user was adjusted after the picture validity check of the first two pictures. This decreased the total number of pictures by a user by two pictures at most. Latter pictures were not examined. In total, 398,228 pictures (92.76% of all pictures and 95.76% of non-empty pictures) were classified as valid pictures.
Some users had not rated their own first picture—these users were excluded when analyzing the difference between their own and average peer healthiness ratings.
Individual users in the dataset were divided into groups based on their adherence, for the analysis of different indicators of adherence (initiation context of self-monitoring and peer feedback). The level of adherence was defined based on the total number of pictures taken and the length of the usage period of the app.
Two types of adherence classifications were formed for different analyses. In the first case, two user groups were formed: (1) users who had taken no valid pictures or only one valid picture (“Dropouts”, 86.39%, 163,949/189,770), and (2) users who had taken more than one valid picture (“Users”, 13.61%, 25,821/189,770). For users who had taken at least one valid picture, three activity levels were defined: (1) “Actives” who had taken at least 10 pictures and had used the app at least one week (2.58%, 4895/189,770), (2) “Semi-actives” who had taken at least two pictures, but less than 10 pictures or whose usage period was less than one week (11.03%, 20,926/189,770), and (3) “Non-actives” who had taken only one valid picture (17.36%, 32,948/189,770).
The proportion of users who had downloaded the app less than one week before the sampling period ended (on March 28, 2012 or later) was 2.01% (3812/189,770). Hence, they could not be classified as Actives. They were still included in the analyses due to their small number.
The association between users’ registration time and adherence level was analyzed to determine whether the temporal context of initial use could have an influence on subsequent usage activity. Registration time was categorized into seven weekdays and each day was divided into five time intervals: time between 0-5 (night), 5-10 (morning), 10-15 (daytime), 15-19 (late afternoon), and 19-24 (evening). These time intervals were chosen to correspond to the natural periods of the day and based on the assumption that most users were from Anglo-American culture, since the app was in English and roughly 68% (127,884/186,933) of the users registered from the main US time zones. The number of Dropouts and Users who had started using the app on each weekday and time of day intervals were calculated. The chi-square (χ2) test was used to compare whether the proportions of Dropouts and Users in weekday and time of day intervals (35 options) were equal to each other. Bonferroni correction was used to adjust for multiple comparisons and the adjusted significance level was set at
Dietary preferences were divided into four categories based on the users’ response to “How do you eat?” question on the first use of the app.
The associations between dietary preferences and adherence were analyzed by calculating the proportion of (1) Actives out of Users + Non-actives (ie, out of all users who took at least one valid picture), and (2) Users out of Users + Non-actives for each dietary preference category. The chi-square test was used to examine whether the proportions were equal between different dietary preference categories. Tukey’s HSD (honestly significant difference) multiple comparison test among proportions was used to analyze which dietary preference categories differed from each other after obtaining significance value
Finally, the existence and length of the textual description given for the first picture taken by the user were compared between Active, Semi-active, and Non-active user groups. One-way ANOVA (analysis of variance) was used for description length and the chi-square test for the existence of the description. This analysis was done to assess the engagement level of the user during the initiation of self-monitoring.
Numbers of users according to their dietary preferences based on “How do you eat?” question.a
“How do you eat?” | Category | Number of users, n (%) |
Not defined | Not defined | 80,118 (42.22%) |
“I eat everything!” | Everything | 87,912 (46.33%) |
“Low fat” | Strict | 7778 (4.10%) |
“Low carbs, no carbs, or paleo” | Strict | 7146 (3.77%) |
“Vegan or vegetarian” | Strict | 6223 (3.28%) |
“Complex carb diet” | Other | 2388 (1.26%) |
“Other” | Other | 2427 (1.28%) |
“Gluten free” or “gluten free” | Other | 229 (0.12%) |
None of the above | Other | 1714 (0.90%) |
Total | Strict | 17,025 (8.97%) |
Total | Other | 4715 (2.48%) |
aNote that some users provided multiple responses to the question.
The amount and quality of peer feedback given for the first picture (average healthiness score, number of likes, number of comments, and difference to peer ratings) were compared between Active, Semi-active, and Non-active user groups to determine whether higher level of feedback on the initiation of self-monitoring was connected with adherence. Only those who had at least one valid picture among the first two pictures they had taken were included because the focus was on the initial feedback. For continuous variables, one-way ANOVA was used and for binary variables, the chi-square test was used to compare whether the proportions were equal between user groups. The numbers of ratings given by the users in each dietary preference category were also calculated to determine whether the stated dietary preference would have a connection to the user’s activity in providing peer feedback to others.
Changes in healthiness ratings were analyzed only among Active users who had at least one valid picture among the first two pictures they had taken (99.35% of Actives, 4863/4895). Other user groups used the app for such a short time that no trend could reliably be identified. First, a correlation coefficient between the average healthiness rating of the first picture and all subsequent pictures was determined. A change (linear regression coefficient) in healthiness ratings as a function of picture index and corresponding
Changes in eating behavior among users with different dietary preferences were also examined. One-way ANOVA was used to compare the average healthiness rating for the first picture and the healthiness rating for all pictures between different dietary preference categories. The number of Improvers or Decliners in each dietary category was determined. The chi-square test was used to examine whether there were an equal proportion of Improvers and Decliners in each dietary preference category.
Adherence data for users who downloaded the free dietary self-monitoring app between October 15, 2011 and April 3, 2012 (n=189,770).
User group | Activity level | Description | Count, |
Pictures per user, |
Usage period in days, |
Dropouts | Non-users | No pictures or no valid pictures | 131,001 (69.03%) | - | - |
Dropouts | Non-actives | Only 1 valid picture | 32,948 (17.36%) | - | - |
Users | Semi-actives | At least two valid pictures and less than 10 pictures or usage period shorter than 7 days | 20,926 (11.03%) | 4.1 (3.7) | 9.3 (19.2) |
Users | Actives | At least 10 pictures and usage period longer than 7 days | 4895 (2.58%) | 58.9 (99.5) | 46.6 (37.7) |
Statistics for the 398,228 valid pictures taken by 58,769 users of the dietary self-monitoring app “The Eatery”.
Variable | Description | Value, |
|
||
|
Number of pictures with textual description | 293,692 (73.75%) |
|
Average length of textual description (if existed) as number of characters | 26.1 (18.1; 1-248) |
|
||
|
Average healthiness rating | 0.581 (0.195; 0.0261-0.986) |
|
Number of pictures having at least one like | 61,299 (15.39%) |
|
Average number of likes (if existed) | 1.3 (0.9; 1-21) |
|
Number of pictures having at least one comment | 15,247 (3.83%) |
|
Average number of comments (if existed) | 1.7 (1.4; 1-28) |
The associations between users’ registration time and adherence level are presented in
Most common dietary preferences (see
Engagement of the user during the initiation of self-monitoring was also assessed by examining the textual description given for the first picture. A textual description for the first picture was given by 26.09% (15,179/58,170) of users who had at least one valid picture among the first two pictures they had taken.
Proportions of Semi-active and Active users in each dietary preference category out of all users who took at least one valid picture.
Users | 1. Not defined, |
2. Everything, |
3. Strict, |
4. Other, |
Test statistics | Differences in post hoc comparisons |
Actives / Users+Non-actives | 742 (4.0%) | 3040 (9.47%) | 900 (14.31%) | 213 (11.9%) |
|
All groups |
Users / Users+Non-actives | 7188 (38.67%) | 14,560 (45.37%) | 3174 (50.45%) | 899 (50.0%) |
|
All but not 3 and 4 |
Comparison of user engagement in the first self-monitoring entry between different adherence groups as measured by the presence and length of textual description for the picture.
First picture characteristics | 1. Non-actives (n |
2. Semi-actives (n |
3. Actives (n |
Test statistics | Differences in post hoc comparisons |
Presence of textual description, n (%) | 5783 (17.71%) | 6824 (33.03%) | 2572 (52.89%) |
|
All groups |
Number of characters in description (if existed), mean (SD) | 20.1 (15.8) | 23.4 (17.5) | 26.8 (19.1) |
|
All groups |
Correlations between users’ adherence level and their local registration time. Black=higher proportion of Users (
Feedback received by the users’ first pictures was examined to determine whether higher level of feedback on the initiation of self-monitoring was connected with adherence. The first picture had at least one like among 7.68% (4470/58,170) of the users and at least one comment among 3.85% (2240/58,170) of them.
Peer feedback was also examined from the perspective of users who gave the ratings to others. Analysis of dietary preferences and rating activity found that users in the “Not defined” diet group gave 21.81% (1,732,976/7,946,447) of all ratings, users in the “Everything” group gave 52.83% (4,198,272/7,946,447), users with Strict diets gave 20.70% (1,645,134/7,946,447), and users with some Other diet gave 4.66% (370,065/7,946,447) of all ratings.
Amount and quality of peer feedback for the initial self-monitoring record in the app between different adherence groups.
First picture characteristics | 1. Non-actives (n |
2. Semi-actives (n |
3. Actives (n |
Test statistics | Differences in post hoc comparisons |
Average healthiness rating, |
0.49 (0.21) | 0.52 (0.20) | 0.55 (0.19) |
|
All groups |
Difference to peer ratings, mean (SD) | 0.04 (0.22) | 0.04 (0.21) | 0.05 (0.18) |
|
1 and 3, |
Having at least one like, n (%) | 2031 (6.22%) | 1792 (8.67%) | 647 (13.30%) | χ2
2=343.6, |
All groups |
Number of likes (if at least one), |
1.1 (0.3) | 1.1 (0.4) | 1.2 (0.4) |
|
1 and 3, |
Having at least one comment, |
663 (2.03%) | 1088 (5.27%) | 489 (10.06%) | χ2
2=909.6, |
All groups |
Number of comments (if at least one), |
1.2 (0.6) | 1.3 (0.9) | 1.4 (1.1) |
|
1 and 3, |
Among the 4863 Active users who had at least one valid picture as their first or second picture, 481 (9.89%) had a significant positive trend in healthiness scores. These “Improvers” differed from other Actives by having a higher total number of pictures (mean 126.68, SD 183.73 vs mean 51.41, SD 82.16;
Users with Strict diets had higher healthiness scores than users in other dietary preference categories and they also had the highest proportion of Improvers (
Average healthiness rating and number of users (Actives) that had a significant linear coefficient in their healthiness rating in each dietary preference category.
Scores/users | 1. Not defined (n |
2. Everything (n |
3. Strict (n |
4. Other (n |
Test statistics | Differences in post hoc comparisons |
Average healthiness rating (first picture), mean (SD) | 0.54 (0.19) | 0.53 (0.19) | 0.60 (0.18) | 0.56 (0.18) |
|
1 and 3, |
Average healthiness rating (all pictures), mean (SD) | 0.56 (0.08) | 0.57 (0.09) | 0.63 (0.08) | 0.60 (0.10) |
|
All groups |
Number of Improvers, |
55 (7.51%) | 281 (9.30%) | 125 (13.95%) | 20 (9.43%) | χ2
3=22.5, |
1 and 3, |
Number of Decliners, |
14 (1.91%) | 72 (2.38%) | 32 (3.57%) | 6 (2.83%) | χ2
3=5.4, |
None |
Almost 190,000 people downloaded the app, The Eatery, between October 2011 and April 2012, but attrition was very high: less than 3% were active users, that is, used the app for more than a week and took 10 or more food pictures. Most of the users did not take any pictures (69%) or took only one picture (17%), which means that they only downloaded the app and experimented with it once without starting dietary self-monitoring. This is similar to most free apps, which are easy to join and try out even if there is no serious intention or commitment to start using the app [
The Eatery was not marketed as a weight loss app but instead as a method to eat healthier (“Stop counting calories, start eating better”), and hence may have attracted a large number of users with no real interest in dietary improvements and thus lacking motivation for dietary self-monitoring. However, the few active users used the application on average 1.5 months. Dietary self-monitoring for this amount of time would be enough to lead to increased awareness of eating habits and changes in behavior, if done diligently. The average healthiness rating of all pictures was 0.58, slightly above the midpoint on the scale of 0 (“fat”) to 1 (“fit”). Hence, users did not photograph only healthy foods and there was room for improvement. A positive trend in healthiness ratings was still observed among only 10% of active users (0.3% of all users). Even if we assume that this trend reflects changes in their real-life eating behavior, the impact of the app on eating choices (or choosing which foods to record) appears to have been very small. Active users took less than two pictures per day on average, which means that a large portion of their eating was left unrecorded. Thus, the positive trend among some users could also mean that they started “gaming the system” by selectively photographing their foods to get better ratings and comments.
Users who used the app for the first time on weekdays (especially on Tuesdays or Wednesdays) and during morning or daytime became semi-active or active users more often than those who started using the app during evenings or weekends. People’s varying eating patterns that depend on their schedules during workdays and outside work [
The dietary preferences reported on the initial use were also connected to adherence level. Users who reported a “strict” diet (low fat, low/no carbs, or vegan/vegetarian) were most likely to become active users. They also gave 21% of all ratings, although only 9% of all users belonged to the strict group. Hence, it is possible that users with strict diets were already most interested in healthy eating. This is also supported by their healthiness ratings: active users with strict diets had higher average healthiness ratings for their first picture and also higher average healthiness ratings for all pictures than users in other dietary preference groups.
The motivation of sustained users might already be seen on the initiation of self-monitoring by looking at how much time and cognitive capacity they devote to it. This is supported by the finding that more than half of the active users (53%) gave a textual description for their first picture whereas less than one-fifth of the non-active users (18%) did so. In addition to pre-existing intention to start dietary self-monitoring, the initial user experience of the app most likely influenced the users’ intention to continue using it. Positive feedback received from peers for the first picture taken by the user was associated with higher adherence; active users had higher average healthiness ratings for the first picture than less active users. This begs the question—did they happen to take a picture of a healthy food and were encouraged by the good feedback to continue using the application or were they already healthy eaters, thus naturally photographing a healthy food? Because a higher proportion of active users also used the app for the first time during weekdays and daytime, the food that they chose to photograph first was probably their workday lunch, which is often healthier than foods that are eaten during weekends [
Although active users obtained more comments and likes for their pictures from peers than those who took only few pictures, the total percentage of pictures with comments (4%) and likes (15%) was quite low. Thus, most users had no connection to other users other than receiving and giving anonymous ratings. The social network formed in such a way is very loose: users neither know whose pictures they rate nor have any knowledge of who rates their pictures. The app itself did not offer explicit advice on what to do to improve eating habits or what constitutes a healthy diet. It may be that if users are merely told that their meal is unhealthy but not given any advice on what to do to make it better, they do not get enough value out of the experience and subsequently lack motivation to continue using the app [
An app like this relies on its users to provide one of its core functions, that is, peer feedback. It would be interesting to study what motivates people to participate in this crowdsourcing activity of giving ratings to others. One explanation is the reciprocity of the action: when a user rates someone else’s pictures, they also get ratings for their own. However, engaging in this activity for a long time might require an existing community or formation of stronger ties between users [
The most significant limitation of the study is the lack of information about user demographics, behavioral outcomes, and initial motives. For example, the association between outcomes and adherence to dietary self-monitoring has been found to differ between race and gender groups in weight loss interventions [
The reliability of healthiness ratings is questionable because they were entirely crowdsourced. The idea of crowdsourcing is to take an average of many individuals’ estimates resulting in an estimate that can be surprisingly close to the truth, although individuals’ separate values may lie far from it. Crowdsourced ratings can be biased, resulting from cultural differences [
Dietary decisions are often unconscious and affected by environmental factors more than people believe [
Some updates were released to the app during the six-month timeframe of data collection. These updates consisted of minor modification and fixes in the user interface of the app. They may have had a minor influence on the user experience of the app, but they were not included in the analyses since the main features and functions remained the same.
Finally, the app utilized self-regulation techniques of self-monitoring and feedback, but lacked other techniques derived from control theory [
As with most mobile apps, the majority of users tried the dietary self-monitoring app only once. Adherence was higher among users who had diets that were likely to restrict at least some unhealthy foods, and these kinds of users were also more active in rating other users’ foods. This could mean that this kind of an application attracts users with special diets and/or those interested in food. Moreover, initiation of self-monitoring in the middle of the week during daytime and the amount of feedback from peers were connected to higher adherence.
Even though the findings show that the app reached a large number of people, its actual impact among users remained small because most did not even start dietary self-monitoring with the app. If people would use the app as intended for dietary self-monitoring on a regular basis, they could experience some benefits through heightened awareness of their eating habits. Still, the app did not implement all self-regulation techniques that could have strengthened its impact and it lacked means to track changes in eating behavior systematically. Reaching those users who could benefit the most from dietary self-monitoring and maintaining their adherence remains a challenge.
analysis of variance
The authors thank Sylvia Cheng and Max Utter at Jawbone for providing the Massive Health dataset and assisting with preliminary data analyses. Massive Health was acquired by Jawbone in February of 2013 and Jawbone continues to support the Massive Health application and associated research. The authors also thank Aleksi Salo at Tampere University of Technology for carrying out data extraction.
This work was partially supported by the SalWe Research Program for Mind and Body (Tekes-the Finnish Funding Agency for Technology and Innovation grant 1104/10).
None declared.