This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
User engagement is key to the effectiveness of digital mental health interventions. Considerable research has examined the clinical outcomes of overall engagement with mental health apps (eg, frequency and duration of app use). However, few studies have examined how specific app use behaviors can drive change in outcomes. Understanding the clinical outcomes of more nuanced app use could inform the design of mental health apps that are more clinically effective to users.
This study aimed to classify user behaviors in a suite of mental health apps and examine how different types of app use are related to depression and anxiety outcomes. We also compare the clinical outcomes of specific types of app use with those of generic app use (ie, intensity and duration of app use) to understand what aspects of app use may drive symptom improvement.
We conducted a secondary analysis of system use data from an 8-week randomized trial of a suite of 13 mental health apps. We categorized app use behaviors through a mixed methods analysis combining qualitative content analysis and principal component analysis. Regression analyses were used to assess the association between app use and levels of depression and anxiety at the end of treatment.
A total of 3 distinct clusters of app use behaviors were identified: learning, goal setting, and self-tracking. Each specific behavior had varied effects on outcomes. Participants who engaged in self-tracking experienced reduced depression symptoms, and those who engaged with learning and goal setting at a moderate level (ie, not too much or not too little) also had an improvement in depression. Notably, the combination of these 3 types of behaviors, what we termed “clinically meaningful use,” accounted for roughly the same amount of variance as explained by the overall intensity of app use (ie, total number of app use sessions). This suggests that our categorization of app use behaviors succeeded in capturing app use associated with better outcomes. However, anxiety outcomes were neither associated with specific behaviors nor generic app use.
This study presents the first granular examination of user interactions with mental health apps and their effects on mental health outcomes. It has important implications for the design of mobile health interventions that aim to achieve greater user engagement and improved clinical efficacy.
Over the past decade, mobile phone apps have become portals for managing health. These digital tools help users monitor physical activity, plan healthy meals, and keep track of daily moods and other personal data. Given the accessibility and ubiquity of mobile phones, researchers and clinicians have increasingly leveraged mobile phone apps to deliver health interventions and enhance self-management of chronic conditions such as depression and anxiety [
For mental health apps to be effective and successful, user engagement is critical. However, little consensus exists on how to define and measure engagement [
However, behavioral engagement metrics have typically employed broad use metrics that measure the
This study aimed to provide a categorization of the types of user behaviors in a suite of mental health apps for depression and anxiety. We then examine how the different types of app use are related to improvements in symptoms of depression and anxiety. To provide a holistic picture of app use, we also differentiate the more nuanced app use from generic app use (ie, intensity and duration of app use) and examine how these different use metrics influence outcomes. As such, this study presents the first granular classification of user interactions with mental health apps and their impact on outcomes.
This study represents a secondary analysis of data from a randomized trial examining the efficacy of coaching and app recommendations to increase engagement with IntelliCare, a suite of mental health apps (Clinicaltrials.gov NCT02801877). Full study details have been described elsewhere [
The IntelliCare platform consists of 12 clinical apps, each targeting a specific behavioral or psychological treatment strategy (eg, cognitive restructuring, behavioral activation, social support, and relaxation) to improve symptoms of depression and anxiety. The specific apps have been described in more detail elsewhere [
Participants assigned to the coach condition received 8 weeks of coaching aimed to support engagement. Coaching was based on a low-intensity coaching model [
Participants randomized to the recommendation condition received recommendations for new apps weekly through the Hub app. The recommendation system leveraged app use data from approximately 80,000 users who had downloaded the IntelliCare apps to identify apps that the individual was more likely to use based on their app use profile. Participants not assigned to the recommendation condition did not receive recommendations and were encouraged to explore the apps by themselves.
Usage logs for each app were recorded locally on the user’s mobile phone, which were then obtained and analyzed to extract app use metrics. In this study, we categorized 2 types of app use: clinically meaningful app use and generic app use.
Procedure for categorizing app use activities across 13 IntelliCare apps.
Viewing/listening: reading/watching/listening to content from the app (eg, playing an exercise video, viewing a coping card, and listening to a relaxing audio)
Creating/inputting: creating and editing content for the purpose of learning and cultivating a skill (eg, identifying a coping activity and creating a positive or self-affirming statement)
Setting goals: selecting, editing, or adding self-identified or assigned goals (eg, adding or deleting a checklist item and selecting a weekly goal)
Scheduling: scheduling activities or changing reminders to fit one’s schedule (eg, scheduling an upcoming exercise and changing the reminder time)
Tracking: keeping track of one’s own performance or status through checking off, rating, or logging personal activities and moods, including facts and reasons (eg, checking a completed activity, rating a level of stress, and creating a sleep log)
Reviewing: reviewing one’s own content and progress (eg, reviewing past activities and lessons).
The primary outcomes of the study were depression and anxiety symptom severity, measured with the PHQ-9 [
Principal component analysis was performed on the 6 identified types of clinically meaningful activities, standardizing by type, to explore any underlying patterns of these activity types. Medians and IQRs of app use metrics were reported. Then, the relationship between app use metrics and treatment outcomes was analyzed using linear regression analyses, adjusting for baseline PHQ-9 or GAD-7 and randomization strata. We first plotted the bivariate relations between all use metrics and end-of-treatment outcomes, which revealed nonlinear patterns. In response, we categorized each app use metric into 4 quartiles. We considered the first quartile minimal intensity of use, the second quartile low intensity of use, the third quartile moderate intensity of use, and the fourth quartile high intensity of use. Regression models were fit to examine the relationship between the quartiles of app use metrics and outcomes, using the lowest quartile as the reference group. Regression coefficients (beta) with their 95% CIs and significance levels were reported for both unadjusted and adjusted models. In addition, the
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
A total of 301 eligible participants were enrolled in the randomized trial. The majority of participants were female (228/301, 75.7%), and the mean age was 37 (SD 11.84) years, ranging from 18 to 69 years. Most (237/301, 78.7%) of the participants identified themselves as white, 29 (9.6%) as African American, 10 (3.3%) as Asian, and 25 (8.3%) as “other.” The mean baseline level of depression (PHQ-9) was 13.21 (SD 4.63), and the mean baseline level of anxiety (GAD-7) was 11.98 (SD 4.02). A total of 10 participants discontinued treatment and were lost to follow-up. Further details of the sample and participant flow through the study are reported in the study by Mohr et al [
Correlation analysis showed that the 6 identified types of clinically meaningful activities were highly correlated; accordingly, we conducted a principal component analysis to further group these activity types. The analysis identified 3 clusters of meaningful activities that could best be described as: (1) “learning,” encompassing “viewing” and “creating;” (2) “goal setting,” including “setting goals” and “scheduling;” and (3) “self-tracking,” consisting of “reviewing” and “tracking.” The first 2 principal components explained 72.4% of the variability in the data (see
Principal component analysis of the types of clinically meaningful activities.
Self-tracking was performed most often, with a median frequency of 152 activities (IQR 61-300). Learning was performed less often, with a median frequency of 110 activities (IQR 52-191). Goal setting was used the least, with a median frequency of 59 activities (IQR 15-141). We also examined the frequency of overall clinically meaningful app use by combining all the 67 identified clinically meaningful use activities. The median frequency of clinically meaningful app use was 400 (IQR 200-608).
Over the 8-week treatment period, the median number of app use sessions was 184 (IQR 116-306), and the median duration of app use over the 8-week treatment period was 3.0 hours (IQR 1.7-5.0).
We first examined how each of the 3 clusters of clinically meaningful activities predicted individuals’ level of depression at the end of treatment, compared with the lowest quartile (minimal use) of each cluster (
Regression models of 3 clusters of clinically meaningful activities predicting depression outcome.
Covariate | Model 1a | Model 2b | Model 3c | ||||
|
Estimate (SE) | Estimate (SE) | Estimate (SE) | ||||
Intercept | 1.86 (0.93) | .047 | 1.57 (0.93) | .09 | 2.16 (0.91) | .02 | |
Coached | −0.78 (0.52) | .13 | −0.42 (0.56) | .46 | −0.12 (0.56) | .83 | |
Full Hub | −0.21 (0.50) | .67 | −0.28 (0.51) | .59 | −0.05 (0.51) | .93 | |
PHQ9_baseline | 0.54 (0.05) | <.001 | 0.56 (0.06) | <.001 | 0.55 (0.06) | <.001 | |
|
|||||||
|
Learning_low intensity | 0.64 (0.73) | .39 | —e | — | — | — |
|
Learning_moderate intensity | −2.17 (0.71) | .002 | — | — | — | — |
|
Learning_high intensity | −1.22 (0.73) | .09 | — | — | — | — |
|
|||||||
|
Goal setting_low intensity | — | — | −0.62 (0.76) | .41 | — | — |
|
Goal setting_moderate intensity | — | — | −2.08 (0.76) | .007 | — | — |
|
Goal setting_high intensity | — | — | −0.76 (0.76) | .32 | — | — |
|
|||||||
|
Self-tracking_low intensity | — | — | — | — | −2.46 (0.78) | .002 |
|
Self-tracking_moderate intensity | — | — | — | — | −1.94 (0.76) | .01 |
|
Self-tracking_high intensity | — | — | — | — | −1.92 (0.73) | .009 |
a
b
c
dValues of reference group.
eNot applicable.
In addition to examining the 3 identified clusters of clinically meaningful activities, we also explored how outcomes were related to overall clinically meaningful app use. As shown in
PHQ-9 at the end of treatment was significantly and negatively associated with low intensity of generic app use (beta=−1.44;
Regression models of total meaningful app use, generic app use, and duration of app use predicting depression outcome.
Covariate | Model 1a | Model 2b | Model 3b | ||||
|
Estimate (SE) | Estimate (SE) | Estimate (SE) | ||||
Intercept | 2.25 (0.92) | .02 | 2.24 (0.92) | .02 | 1.71 (0.93) | .07 | |
Coached | −0.34 (0.53) | .52 | −0.58 (0.51) | .26 | −0.57 (0.54) | .23 | |
Full Hub | −0.11 (0.51) | .82 | 0.26 (0.54) | .63 | 0.02 (0.53) | .98 | |
PHQ9_baseline | 0.55 (0.06) | <.001 | 0.55 (0.05) | <.001 | 0.54 (0.06) | <.001 | |
|
|||||||
|
Meaningful use_low intensity | −2.00 (0.74) | .007 | —e | — | — | — |
|
Meaningful use_moderate intensity | −2.07 (0.74) | .006 | — | — | — | — |
|
Meaningful use_high intensity | −2.05 (0.74) | .006 | — | — | — | — |
|
|||||||
|
Generic app use_low intensity | — | — | −1.44 (0.72) | .047 | — | — |
|
Generic app use _moderate intensity | — | — | −2.38 (0.73) | .001 | — | — |
|
Generic app use_ high intensity | — | — | −2.45 (0.76) | .001 | — | — |
|
|||||||
|
Generic app use_low duration | — | — | — | — | −0.32 (0.75) | .68 |
|
Generic app use_moderate duration | — | — | — | — | −1.52 (0.76) | .045 |
|
Generic app use_high duration | — | — | — | — | −1.24 (0.78) | .12 |
a
b
c
dValues of reference group.
eNot applicable.
Anxiety (GAD-7) at the end of the treatment was neither significantly associated with the 3 clusters of clinically meaningful activities (all
This study provided a categorization of user behaviors in a suite of mental health apps and investigated how different types of app use were related to improvements in depression and anxiety symptoms following an 8-week intervention. The results showed that different types of clinically meaningful activities (ie, learning, goal setting, and self-tracking) had varied effects on outcomes. Self-tracking at varied levels of intensity was related to improvement in depression symptoms, whereas only moderate intensity of learning and goal setting predicted improvement in depression symptoms. Thus, this study provides insight into how different types of app use might be conducive to improved intervention outcomes.
Drawing on a mixed methods approach, we identified 6 types of clinically meaningful activities across multiple apps, which were further grouped into 3 clusters—learning, goal setting, and self-tracking. This categorization was achieved through a combination of qualitative content analysis and quantitative statistical analysis. The results show that users engaged in self-tracking most frequently, followed by learning and goal setting. These 3 types of use activities have been well documented in mHealth and human-computer interaction (HCI) research as approaches to drive engagement and promote behavior change [
Notably, overall clinically meaningful app use (combination of all 67 identified clinically meaningful use activities) accounted for roughly the same amount of variance in depression severity as explained by the intensity of overall app use (ie., total number of app use sessions). Therefore, our identification of clinically meaningful app use was successful at capturing the activities associated with better mental health outcomes. This suggests that we accurately identified the clinically meaningful intervention components within this suite of apps. As such, we believe that the association between app use and outcome can be largely explained by these clinically meaningful use activities, which clustered into 3 types of activities, reinforcing the importance of self-tracking, goal setting, and psychoeducation elements within mHealth interventions for depression.
More specifically, these 3 clusters of clinically meaningful activities were associated with reductions in depression symptoms at the end of treatment. In particular, self-tracking was found to be beneficial at all levels of intensity compared with minimal intensity of use. This is in accordance with HCI research suggesting that self-tracking, or personal informatics, can lead to behavior change [
It is important to note that greater amounts of engagement did not necessarily lead to greater reductions in depression. Although self-tracking was generally beneficial, only a moderate level of engagement with learning and goal setting was associated with reduced depressive symptoms. Neither high nor low intensity of app use could predict better outcomes compared with minimal intensity of use. This result suggests that mHealth interventions might follow the Goldilocks principle—“Not too much. Not too little. Just right” [
The overall intensity of generic app use also predicted reductions in depression symptoms. Generally, it appears that people who engaged in higher intensity of app use had lower levels of depression at the end of treatment. However, the duration of app use minimally contributed to better outcomes. This finding corresponds to prior work suggesting that people tend to use mobile apps in very short bursts of time, given their habit of using smartphones in spare moments [
However, our investigation of meaningful app use was not associated with reduced anxiety symptoms. This is consistent with the findings in the main trial, where significant reductions in anxiety symptom were not related to number of app sessions or time between first and last app use but were only associated with the number of app downloads [
Overall, this study has important implications for the design of mHealth for depression, which includes the following:
Self-tracking, goal setting, and learning are 3 components that have clinical benefits, which should be incorporated into mental health apps.
Mental health apps could be designed according to the Goldilocks principle, incorporating the “just right” amount of intervention components and promoting use at the right amount, possibly through sending user reminders or alerts based on app use data.
People tend to use apps in very short bursts of time, so mental health apps should be quick to use, have simple interactions, and support a single or limited set of related tasks.
However, because of the exploratory nature of the research, design considerations derived from this study focus only on app content and engagement. Within the wider context, research indicates that app design and quality assessment must also consider users’ lived experience, app usability and stability, and data privacy and security [
Despite its contributions and implications, this study has some limitations. First, the user activities identified in this study were not exhaustive; some activities were eliminated because of their low frequency. As a secondary analysis, this study is exploratory by nature, and future studies should continue exploring more specific types and patterns of user behaviors in using mHealth technologies and their relationships with outcomes of mental health conditions. Second, although this study demonstrated the associations of both generic and specific app use with clinical outcomes over the treatment period, it is difficult to make causal claims about the effects. The relationship between app use and symptom change is likely dynamic. For example, app use may contribute to lower subsequent symptoms, and symptom changes may in turn increase app use [
Engagement with digital health interventions is a long-standing problem; however, little is known about how users interact with mental health apps in clinically meaningful ways. This study employed a novel, mixed methods methodology to derive greater understanding of users’ engagement with apps that cannot be seen through generic use data. Using a combination of qualitative and quantitative methods, we uncovered 3 clusters of clinically meaningful activities—learning, goal setting, and self-tracking—with each type associated with reductions in depression symptoms. However, different activities and intensities of use produced varied effects. Although only moderate intensity of learning and goal setting led to reductions in symptoms of depression, self-tracking at all levels of intensity predicted improvement in depression. Understanding the relationship between different types of user activities and clinical outcomes could inform the design of mental health apps that are more clinically effective for users.
Codebook of the categorization of app use behaviors.
generalized anxiety disorder-7
human-computer interaction
mobile health
Patient Health Questionnaire-9
Research was supported by the National Institute of Mental Health (T32 MH115882; R01 MH100482) and National Institute of Diabetes and Digestive and Kidney Diseases (K01 DK116925).
DCM has an ownership interest in Adaptive Health, Inc, which has a license from Northwestern University for IntelliCare and has accepted honoraria from Apple Inc. AKG and MJK have received consulting fees from Actualize Therapy, LLC. None of the other authors have conflicts to declare.