This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Although gamification continues to be a popular approach to increase engagement, motivation, and adherence to behavioral interventions, empirical studies have rarely focused on this topic. There is a need to empirically evaluate gamification models to increase the understanding of how to integrate gamification into interventions.
The model of gamification principles for digital health interventions proposes a set of five independent yet interrelated gamification principles. This study aimed to examine the validity and reliability of this model to inform its use in Web- and mobile-based apps.
A total of 17 digital health interventions were selected from a curated website of mobile- and Web-based apps (PsyberGuide), which makes independent and unbiased ratings on various metrics. A total of 133 independent raters trained in gamification evaluation techniques were instructed to evaluate the apps and rate the degree to which gamification principles are present. Multiple ratings (n≥20) were collected for each of the five gamification principles within each app. Existing measures, including the PsyberGuide credibility score, mobile app rating scale (MARS), and the app store rating of each app were collected, and their relationship with the gamification principle scores was investigated.
Apps varied widely in the degree of gamification implemented (ie, the mean gamification rating ranged from 0.17≤m≤4.65 out of 5). Inter-rater reliability of gamification scores for each app was acceptable (κ≥0.5). There was no significant correlation between any of the five gamification principles and the PsyberGuide credibility score (
Overall, the results support the validity and potential utility of the model of gamification principles for digital health interventions. As expected, there was some overlap between several gamification principles and existing app measures (eg, MARS). However, the results indicate that the gamification principles are not redundant with existing measures and highlight the potential utility of a 5-factor gamification model structure in digital behavioral health interventions. These gamification principles may be used to improve user experience and enhance engagement with digital health programs.
There is substantial interest in understanding how gamification can improve electronic health (eHealth) and mobile health (mHealth) interventions [
The model of gamification principles for internet interventions [
Gamification principle and description:
Meaningful purpose: The app presents goals that align with the user’s motivations and interests
Meaningful choice: The app gives users agency over how they achieve their goals
Supporting player archetypes: Mechanics in the app leverage individual user and player characteristics
Feedback: The app communicates how user actions affect progress
Visibility: The app makes clear to users the amount of progress made and how much more is needed
The proposed gamification principles encourage developers to separate the idea of typically considered gamification mechanics (eg, points, badges, and leaderboards) from the purpose of those mechanics (eg, motivating the purpose, increasing user choice, supporting player archetypes, etc). In this way, the model encourages researchers to consider the mechanics of gamification and how those relate to the underlying motivational affordances of the user. It also provides researchers with a framework for implementing these mechanics within the technology-based interventions they create. In behavioral science, several attempts have been made to specify various behavior change strategies [
Mobile- and Web-based interventions were selected from PsyberGuide, a nonprofit endeavor that aims to provide consumers with information to aid in selecting different types of mental health apps. PsyberGuide conducts independent and unbiased reviews of mental health apps and evaluates products on three dimensions: credibility, user experience, and privacy and data security. PsyberGuide has evaluated over 200 mental health apps and is viewed as a useful standard for determining the quality of apps on these various metrics [
List of apps and their modality or purpose.
App name | Modality or purpose |
Headspace | Mindfulness and meditation |
Lumosity | Cognitive training |
iSleep Easy | Meditation and restful sleep |
FitBrains | Cognitive training |
Happify: For Stress & Worry | Well-being and happiness |
Serenita | Stress and anxiety |
SuperBetter | Goal setting, resilience, motivation |
Flowy | Breathing and relaxation |
The Mxindfulness App | Mindfulness and meditation |
HAPPYneuron | Cognitive training |
Smiling Mind | Mindfulness and meditation |
Pacifica (now Sanvello) | Anxiety and depression |
Virtual Hope Box | Anxiety and depression |
Peak | Cognitive training |
Personal Zen | Stress and anxiety |
BrainHQ | Cognitive training |
Wildflowers | Mindfulness and meditation |
A novel self-report measure was developed based on the gamification principles for internet interventions model. The measure is composed of five items, with each item assessing the principle of gamification. Items include a description of the gamification principle (eg, meaningful purpose) in question and instruct the rater to judge the presence of that principle within the intervention. Embedded within the descriptions are probing questions to help determine the extent the principle is present (eg, “Does the application allow the user to make decisions about how they reach their goal?”). One item was used to assess each gamification principle to increase the efficiency of raters and limit the response burden. Single-item scales have been used and validated to assess complex constructs such as self-esteem, job satisfaction, and personality traits [
The Mobile App Rating Scale (MARS; Stoyanov et al [
PsyberGuide credibility ratings are meant to determine the likelihood that a given product will produce the proposed benefits. It is based on an assessment of the strength of research evidence, source of research evidence, specificity of the app, expertise of the development team, number of app store ratings, and recency of updates. PsyberGuide credibility rating scores are made by a team of trained reviewers consisting of undergraduate or masters-level students using an approval and consensus process (maximum score of 5.0). PsyberGuide credibility rating scores were obtained from the PsyberGuide website.
The app store rating (Apple or Android) for each intervention was obtained. App store ratings are based on a system of stars (0-5), with a higher number of stars indicating a greater degree of liking the app.
Each of the 133 raters was randomly assigned to evaluate three apps. Raters were trained using a 2-part approach. The first was a 75-min training session that reviewed the theory of heuristic evaluations, a core concept in HCI. Heuristic evaluations occur when trained raters use a system and rate how well the design conforms to a set of described heuristics [
Raters were given 2 weeks to evaluate their apps and were instructed to use each app for at least 15 min every day. Specifically, raters were asked to use the app as a normal user and to examine the presence of each of the gamification principles. Although rater usage was not tracked, the raters were encouraged to maintain lists of specific examples of each gamification principle they encountered and to list them in their written justifications. After 2 weeks, raters were given 1 week to complete a 10-question survey. For each gamification principle, raters were asked whether the principle was present in the app (binary), and, if so, to what degree that principle was present (1-5 scale). These pairs of questions were later combined to create single 6-point (0-5 scale) responses. For the presence questions (1-5 scale), a description was provided for scores 1, 3, and 5. This was done to allow raters some flexibility in interpreting the score between these endpoints and middle points. To ensure that raters had provided thoughtful responses, they were asked to provide a justification for each of their scores by writing a supporting paragraph. The raters were assigned a grade to complete this assignment and to provide reasonable and thoughtful justifications for the provided scores. Raters provided reasonable justifications, earning an average of 9.4 out of 10 on this assignment. The university institutional review board (IRB) was contacted with the details of this endeavor, and it was determined that no IRB protocol was necessary.
Statistical analysis outliers (ratings more than two SDs from the mean) were identified. However, on review of the justifications provided by these raters, no data were removed. For each program and survey question combination, the mean scores and SDs were calculated. Interrater reliability scores for each app were calculated to determine the degree of agreement among the independent raters. Interrater reliability was obtained by using the weighted Fleiss kappa [
Correlations and
Mean and SD ratings for each gamification principle across raters.
App name | Meaningful purpose, mean (SD) | Meaningful choice, mean (SD) | Supporting player archetypes, mean (SD) | Feedback, mean (SD) | Visibility, mean (SD) |
Headspace | 4.29 (0.84) | 3.92 (1.04) | 3.50 (1.47) | 3.33 (1.49) | 4.08 (1.08) |
Lumosity | 4.36 (0.79) | 3.08 (1.49) | 3.40 (1.10) | 4.32 (0.93) | 3.76 (1.21) |
iSleep Easy | 3.92 (1.00) | 2.71 (1.57) | 0.17 (0.47) | 1.17 (1.40) | 0.25 (0.60) |
FitBrains | 4.10 (1.38) | 2.76 (1.69) | 3.33 (1.04) | 4.38 (1.09) | 4.29 (1.39) |
Happify: For Stress & Worry | 4.26 (1.07) | 3.61 (1.31) | 3.26 (1.72) | 3.83 (1.27) | 3.74 (1.36) |
Serenita | 3.91 (1.06) | 2.57 (1.50) | 1.43 (1.58) | 3.43 (1.47) | 3.70 (1.23) |
SuperBetter | 3.65 (1.05) | 3.61 (1.34) | 3.00 (1.41) | 3.26 (0.94) | 3.48 (1.25) |
Flowy | 1.95 (1.07) | 1.64 (1.37) | 1.91 (1.44) | 2.32 (1.49) | 2.91 (1.20) |
The Mindfulness App | 3.59 (1.34) | 3.59 (1.30) | 2.41 (1.44) | 2.18 (1.61) | 2.86 (1.42) |
HAPPYneuron | 4.15 (0.91) | 3.25 (1.30) | 2.90 (1.34) | 3.35 (1.24) | 3.55 (1.16) |
Smiling Mind | 4.09 (1.06) | 3.78 (1.18) | 2.57 (1.61) | 3.26 (1.80) | 3.96 (0.95) |
Pacifica | 4.04 (1.04) | 4.40 (0.98) | 4.56 (0.70) | 3.64 (1.29) | 4.16 (0.97) |
Virtual Hope Box | 3.30 (1.43) | 3.43 (1.38) | 2.43 (1.77) | 0.70 (1.16) | 0.17 (0.38) |
Peak | 4.35 (0.91) | 3.13 (1.73) | 2.96 (1.60) | 4.26 (0.85) | 4.65 (0.70) |
Personal Zen | 2.14 (1.12) | 0.86 (1.17) | 0.95 (1.13) | 2.86 (1.36) | 3.43 (1.37) |
BrainHQ | 4.08 (1.19) | 2.54 (1.68) | 2.58 (1.55) | 4.00 (0.82) | 3.75 (1.16) |
Wildflowers | 3.65 (1.34) | 2.78 (1.44) | 2.35 (1.68) | 3.52 (1.35) | 3.91 (1.10) |
Interrater reliability scores for each app.
App | Interrater reliability |
Headspace | 0.53 |
Lumosity | 0.57 |
iSleep Easy | 0.67 |
FitBrains | 0.55 |
Happify: For Stress & Worry | 0.51 |
Serenita | 0.53 |
SuperBetter | 0.54 |
Flowy | 0.52 |
The Mindfulness App | 0.51 |
HAPPYneuron | 0.54 |
Smiling Mind | 0.52 |
Pacifica | 0.58 |
Virtual Hope Box | 0.61 |
Peak | 0.57 |
Personal Zen | 0.58 |
BrainHQ | 0.54 |
Wildflowers | 0.51 |
Correlation matrix between gamification principles across all rated apps.
Gamification principle | Meaningful purpose | Meaningful choice | Supporting player archetypes | Feedback | Visibility |
Meaningful purpose | N/Aa | 0.71 | 0.50 | 0.48 | 0.30 |
Meaningful choice | N/A | N/A | 0.69 | 0.11 | 0.11 |
Supporting player archetypes | N/A | N/A | N/A | 0.57 | 0.57 |
Feedback | N/A | N/A | N/A | N/A | 0.92 |
Visibility | N/A | N/A | N/A | N/A | N/A |
aN/A: not applicable.
The means and SDs for each gamification principle across the 17 apps are shown in
Various metrics scoring each studied app (1-5 scale).
App | PsyberGuide Credibility Score | Mobile App Rating Scale | App store rating |
Headspace | 4.64 | 4.74 | 4.9 |
Lumosity | 3.21 | 4.34 | 4.7 |
iSleep Easy | 3.55 | 3.01 | 4.6 |
FitBrains | 2.85 | 4.67 | 3.7 |
Happify: For Stress & Worry | 3.92 | 3.34 | 4.5 |
Serenita | 3.2 | 3.2 | 3 |
SuperBetter | 3.55 | 4.39 | 4.7 |
Flowy | 2.5 | 4.1 | 4.3 |
The Mindfulness App | 2.85 | 3.3 | 4.4 |
HAPPYneuron | 2.5 | 4.15 | N/Aa |
Smiling Mind | 2.85 | 4 | 4.6 |
Pacifica | 2.85 | 4.7 | 4.7 |
Virtual Hope Box | 3.92 | 3.59 | 4.4 |
Peak | 2.85 | 4.52 | 4.7 |
Personal Zen | 3.95 | 3.77 | 2.6 |
BrainHQ | 4.6 | 4.11 | 4.6 |
Wildflowers | 2.85 | 4.08 | 4.33 |
aN/A: not applicable.
This study provides empirical support for the model of the five gamification principles for internet interventions. We believe this model will help researchers develop new interventions and evaluate existing interventions that better engage users through the proper implementation and integration of gamification techniques. By evaluating the gamification principles in 17 health apps, the findings from this study indicate that the gamification principles are not redundant with existing app measures.
A weak relationship was found between the gamification principle ratings and the PsyberGuide credibility score. The PsyberGuide credibility score is based on several factors, some of which have no intuitive relationship to the gamification principles evaluated in this study. For example, one aspect of the PsyberGuide credibility score involves the amount of research funding the app had garnered, which has no direct connection with gamification. Other aspects of the PsyberGuide credibility score focus on the degree to which research is available on the efficacy of that app or the frequency of the updates to the app. Although some of these aspects, such as the frequency of updates, have been found to be useful predictors of some evaluations of apps such as expert-rated quality or user ratings [
There were significant relationships between 3 of the 5 gamification principles (supporting player archetypes, feedback, and visibility) and the MARS score. Theoretically, one would imagine some overlap between our gamification model and user experience aspects such as engagement. The MARS (collected from PsyberGuide) [
Three different gamification principles (meaningful purpose, meaningful choice, and supporting layer archetypes) were significantly associated with the app store ratings. In contrast to the MARS, the app store ratings are single overall ratings provided by end users. By directly sampling from end users, the app store rating may be viewed as largely a reflection of user choice. There was a strong association between app store ratings and the gamification principle of meaningful choice (
This study extends the existing literature that aims to understand technology-based behavioral interventions by identifying and coding their features [
Several potential limitations exist for this study. Most notably, although the raters were trained with several modes of material, they were undergraduate students and not experts; however, several results from the study help limit this concern. The interrater reliability scores (weighted kappa) were all within the acceptable range (κ>0.50), which suggests that although there was variance in the scores, the raters generally gave similar ratings to apps. In addition, raters provided written justifications for each of their scores. An expert read through these justifications with the intention of removing ratings that included clear evidence of a poor rating. In the end, no ratings were deemed to lack sufficient justification, and all ratings were included in the analysis presented here.
Owing to the difficulty in calculating internal consistencies and establishing construct validity for single-item measures (to assess each gamification principle), future work should examine ways to assess gamification using more items. Although our decision was influenced by a desire to reduce the burden on raters and increase the efficiency with which ratings could be completed, there are many ways to assess gamification that should be explored in future work. However, because the results show evidence of both convergent and discriminant validity, there is some evidence that construct validity does exist.
Potential bias also exists with the app selection methodology. Although the PsyberGuide listing likely contained more than 17 apps that fit the inclusion criteria, this was deemed a sufficient number, given the quantity of raters and associated ratings to be obtained. An attempt was made to include a heterogeneous sample of apps that covered a variety of areas of focus and population types. However, a more systematic approach could have been used to select the apps, or, with sufficient resources, to include all apps.
There were some strong associations among gamification principles. Most notably, the principles of feedback and visibility had a strong relationship (
Strong correlations occurred among other gamification principle ratings as well, most of which are less easily explainable. For example, the principles of meaningful choice and meaningful purpose correlate strongly (
In short, this paper is the first evaluation of a method determined to assess previously proposed gamification principles [
Gamification Principles and associated questionnaire used by the raters in this study.
Correlation coefficients, t test statistics, and P values between each gamification principle’s average ratings and each of the PsyberGuide credibility rating, Mobile App Rating Scale, and app store rating.
electronic health
human-computer interaction
institutional review board
Mobile App Rating Scale
mobile health
The authors would like to acknowledge Jodie Ryu, who managed the curating of the apps, assignments of raters to apps, and other logistical matters.
SS receives funding from One Mind, of which PsyberGuide is a proprietary project. SS is the executive director of the PsyberGuide.