Background: Randomized controlled trials (RCTs) with vigorous study designs are vital for determining the efficacy of treatments. Despite the high internal validity attributed to RCTs, external validity concerns limit the generalizability of results to the general population. Bias can be introduced, for example, when study participants who self-select into a trial are more motivated to comply with study conditions than are other individuals. These external validity considerations extend to e-mental health (eMH) research, especially when eMH tools are designed for public access and provide minimal or no supervision.
Objective: Clustering techniques were employed to identify engagement profiles of RCT participants and community users of a self-guided eMH program. This exploratory approach inspected actual, not theorized, RCT participant and community user engagement patterns. Both samples had access to the eMH program over the same time period and received identical usage recommendations on the eMH program website. The aim of this study is to help gauge expectations of similarities and differences in usage behaviors of an eMH tool across evaluation and naturalistic contexts.
Methods: Australian adults signed up to myCompass, a self-guided online treatment program created to reduce mild to moderate symptoms of negative emotions. They did so either by being part of an RCT onboarding (160/231, 69.6% female) or by accessing the program freely on the internet (5563/8391, 66.30% female) between October 2011 and October 2012. During registration, RCT participants and community users provided basic demographic information. Usage metrics (number of logins, trackings, and learning activities) were recorded by the system.
Results: Samples at sign-up differed significantly in age (P=.003), with community users being on average 3 years older (mean 41.78, SD 13.64) than RCT participants (mean 38.79, SD 10.73). Furthermore, frequency of program use was higher for RCT participants on all usage metrics compared to community users through the first 49 days after registration (all P values <.001). Two-step cluster analyses revealed 3 user groups in the RCT sample (Nonstarters, 10-Timers, and 30+-Timers) and 2 user groups in the community samples (2-Timers and 20-Timers). Groups seemed comparable in patterns of use but differed in magnitude, with RCT participant usage groups showing more frequent engagement than community usage groups. Only the high-usage group among RCT participants approached myCompass usage recommendations.
Conclusions: Findings suggested that external validity concerns of RCT designs may arise with regards to the predicted magnitude of eMH program use rather than overall usage styles. Following up RCT nonstarters may help provide unique insights into why individuals choose not to engage with an eMH program despite generally being willing to participate in an eMH evaluation study. Overestimating frequency of engagement with eMH tools may have theoretical implications and potentially impact economic considerations for plans to disseminate these tools to the general public.
Well-designed randomized controlled trials (RCTs) are widely seen as the gold standard for determining treatment efficacy as random assignment to either a treatment or control group allows for the isolation of the treatment effect from both known and unknown confounding factors . Although the importance of RCTs in establishing internal validity is undisputed, researchers have pointed out the external validity concerns of RCTs [ ]. These concerns relate to participant selection, attention, retention, researcher contact, and specifics and frequency of data collection—all of which can limit the generalizability of findings to the general public.
Such external validity considerations may be even more justified when considering those RCTs that evaluate e-mental health (eMH) programs, which are designed to deliver effective, scalable mental health care in the community [- ]. For example, RCTs of eMH programs often selectively recruit from online communities [ ] and provide a level of direction for adhering to usage recommendations (eg, reminders) that users in the general public will not encounter, particularly in self-guided eMH programs. These top-down practices affecting user attrition have been described as “push factors” [ ]. RCT participants may also be particularly motivated to follow the research protocol and have researcher contact, which likely differs from how the general public experience an eMH program. Furthermore, deterministic approaches to evaluating eMH program effectiveness do not necessarily reflect the ever-changing eMH landscape [ ]. As Sieverink and colleagues [ ] pointed out, RCTs lack the ability to inform about processes (ie, how behavior evolves between the pre-, post-, and follow-up assessments) or which program components (or combination thereof) contribute to improvements in mental health outcomes. It is therefore pivotal for eMH interventions to show that the treatment effects initially shown in RCTs can translate into real-world benefits in practicable ways.
Despite the need for eMH programs to be applicable to real-world conditions, information on the external validity of eMH treatment outcomes is not widely available. In a recent systematic review of digital interventions addressing comorbid depressive symptoms and substance use, only 1 of the 6 studies examined reported on the comparability of the sample used to the wider population . Similarly, in a review paper of mobile apps promoting physical activity, Blackman and colleagues [ ] found that all mobile health intervention studies considered (N=20) reported on treatment effectiveness, but only 4 of these studies reported on how representative the study sample was for the target population.
This problem extends to eMH program design. A considerable body of research examining the effects of different eMH design features on behavioral changes has not addressed the question of how applicable these findings are once the programs are disseminated . Although at least some studies address limitations to the generalization of eMH trial findings, studies examining how or if study protocols influence eMH engagement behaviors are rare. Arguably, whether or not a study protocol influences participant behavior constitutes another important factor in establishing the external validity of eMH findings [ ]. One comparison between RCT and real-world uptake was undertaken with moodgym, an eMH program aimed at reducing anxiety and depression. It showed that public registrants were less likely than RCT participants to complete the recommended number of treatment modules [ ]. Interestingly, however, symptom reductions over time were comparable between both RCT and community user groups, raising the question of whether a reduced protocol may yield similar benefits.
This short paper attempts to deepen the discussion on the influence of the RCT environment on eMH engagement behavior by presenting engagement patterns of RCT participants and community users of an eMH tool, called myCompass, side by side. Our aim is to explore whether patterns of program engagement differ between RCT participants who receive usage recommendations as part of being involved in an evaluation study versus users in the general community who receive the same usage recommendations only on the myCompass homepage. The goal of this paper is to examine engagement rather than outcomes; therefore, we are presenting cluster analysis findings that help visualize behavioral patterns rather than quantify differences in health and well-being.
The present analyses are based on the first version of myCompass, an eMH program that was available to all Australians between 2011 and 2018. myCompass is hosted and run by the Black Dog Institute, and funded by the Australian Department of Health. Version 1 of this self-guided program offered online mental health resources and activities to address mild to moderate symptoms of depression, anxiety, and stress. Core functionalities of myCompass were the daily tracking and learning activities components. The tracking function allowed users to track up to 3 moods, behaviors, or cognitions (eg, sadness, alcohol consumption, worry) in real time. Users indicated their current states on an interval scale from low (0) to high (10). Another main function of myCompass was learning activities. Learning activities were a set of 14 modules which aimed at aiding beneficial behaviors such as goal setting, sleep quality, or managing fear and anxiety. Each module was split into 2 to 3 sessions to promote skill-building exercises over the course of several days and took about 10 to 15 minutes to complete online.
Our study considered engagement data from 2 distinct samples: (1) an RCT participant sample and (2) a naturalistic community sample of general public users who freely adopted myCompass. All participants and users registered to version 1.0 of myCompass between October 2011 and October 2012. Trial participants and community users freely registered either to the research study or directly to the myCompass website. Trial recruitment took place through multiple avenues, such as social media posts on Facebook and announcements on the Black Dog Institute volunteer research register , while the myCompass website could be found spontaneously through internet search engines. RCT participants and general public users received the same usage recommendations on the myCompass home page of completing at least 2 modules and using the tracking function once a day (see ). All users, independent of whether or not they were part of the research study, set the frequency of program use reminders in line with their own preferences on the myCompass website. However, only research participants were able contact research staff.
The RCT sample comprised 231 participants allocated to the eMH treatment group for the initial evaluation study of myCompass . Exclusion criteria for entering the study were being younger than 18 or older than 75 years, not possessing an internet-enabled mobile phone, not having access to a computer with internet and email, and showing either minimal or severe symptoms of depression and anxiety as determined by symptom scores on the Depression Anxiety and Stress Scale [ ].
The community sample comprised 8391 adults who registered for myCompass on their own accord. Community members needed to provide a valid email address and mobile phone number to verify their willingness to register to the program. As part of the profile setup within the myCompass program, RCT participants and community users alike completed assessments of common mental health symptoms, which formed the basis for tracking recommendations made by the program . If scores indicated severe distress (ie, depression, anxiety, or a stress scores of 8 or higher out of 10) or suicidal ideation, the sign-up process was terminated and individuals were redirected to the Black Dog Institute website with information on how to seek immediate support.
Measurement of User Engagement
We selected 4 usage metrics to assess user engagement. These were number of logins, number of logged trackings (irrespective of how many items were tracked at any one time), number of learning activities started, and number of learning activities completed.
All analyses were conducted in SPSS version 25 (IBM Corp). Participants using myCompass as part of the RCT and those in the community sample were initially compared using chi-square and t tests. We then conducted 2 two-step cluster analyses to determine distinct usage groups within the RCT participant and community user samples. A two-step cluster analysis allows for the detection of naturalistic (ie, not hypothesis-driven) groupings within a data set by way of examining distances between data points (step 1). Based on these distance calculations, an algorithm (step 2) determines the number of groupings or clusters for each data set. In the current analysis, we selected the log-likelihood distance measure and the Schwarz Bayesian information criterion statistic to determine the most appropriate number of clusters.
Data were available for 8622 users of myCompass.shows the basic demographic information and online engagement behaviors of individuals taking part in the myCompass RCT and those signing up to myCompass via the program’s website. Although both groups had similar gender ratios (RCT: 160/231, 69.6% female; general public: 5563/8391, 66.30% female; P=.30), community users were on average 3 years older than RCT participants (P=.003). Notably, all average access and engagements statistics were higher for RCT participants than for general public users (all P values <.001).
|Variable||RCTa sample (n=231), mean (SD)||Community sample (n=8391), mean (SD)||F test||P values||d|
|Age (years)||38.79 (10.73)||41.78 (13.64)||9.07||.003||0.20|
|Female (%)||1.70 (0.46)||1.66 (0.47)||1.07b||.30||N/A|
|49-day logins (n)||11.26 (15.78)||3.90 (6.91)||229.21||<.001||1.01|
|49-day trackings (n)||11.26 (10.42)||2.57 (5.95)||343.48||<.001||1.24|
|49-day modc started (n)||1.14 (1.62)||0.53 (0.85)||113.81||<.001||0.71|
|49-day mod completed (n)||1.02 (1.53)||0.12 (0.53)||555.85||<.001||1.57|
aRCT: randomized controlled trial.
Two-Step Cluster Analyses
A number of RCT participants allocated to the myCompass group did not proceed to register to myCompass (73/231, 31.6%) and therefore did not provide any engagement data. To make the RCT group more comparable to the general public sample (where each person registered to myCompass and provided engagement data), we removed these participants from the sample used for the cluster analysis.shows the results of the cluster analysis among RCT participants who were allocated to myCompass for the duration of the 7-week intervention period. The cluster analysis yielded 2 distinct user groups among the RCT participants. Results of a multivariate analysis of variance confirmed significant cluster group differences on all 4 usage variables (all P values <.001; see ), indicating that the clustering procedure was successful in establishing distinct user groups.
The first and larger user group (114/158, 72.2%) were “10-Timers” who logged into their assigned program around 9 times (mean 8.84, SE 2.01) and used myCompass mainly for tracking (mean 7.77, SE 1.83) rather than completing learning activities (mean 0.23, SE 0.21). The second user group (44/158, 27.8%) were “30+-Timers”, who logged into myCompass every other day (mean 36.20, SE 1.71). When the 30+-Timers logged on, they used the program’s tracking function (mean 34.59, SE 1.56) in addition to starting and completing around three learning activities (modules started: mean 3.36, SE 0.19; modules completed: mean 2.67, SE 0.18) over the course of 7 weeks.
|Variables||Nonstarters (n=73), mean (SE)||10-Timers (n=114), mean (SE)||30+-Timers (n=44), mean (SE)||F test (1, 156)||P values|
|Age (years)||38.79 (10.73)a||37.08 (10.71)||41.40 (8.09)||5.82||.02|
|Female (%)||1.70 (0.46)a||1.70 (0.46)||1.66 (0.48)||0.24||.62|
|Logins (n)||N/A||8.84 (2.01)||36.20 (1.71)||184.53||<.001|
|Trackings (n)||N/A||7.77 (1.83)||34.59 (1.56)||214.73||<.001|
|Modb started (n)||N/A||0.77 (0.22)||3.36 (0.19)||145.22||<.001|
|Mod completed (n)||N/A||0.23 (0.21)||2.67 (0.18)||156.73||<.001|
aDemographic information on Nonstarters is included for informational purposes and was not part of the analysis.
presents the results of the cluster analysis in the general public. Similar to findings among RCT participants, the community sample cluster analysis yielded 2 distinct clusters that differed across all usage variables (all P values <.001). The vast majority of general public users (7681/8391, 91.54%) were “2-Timers” or individuals who entered myCompass approximately twice (mean 2.25, SE 0.17) and used the tracking function once (mean 1.13, SE 0.14). The average module completion neared zero in this group (mean 0.03, SE 0.02).
The second and considerably smaller group were the “20-Timers” (710/8391, 8.46%) who used myCompass consistently. Members of this group logged in on average 22 times (mean 21.82, SE 0.16) and used the tracking function the majority of the time (mean 18.06, SE 0.14) over a 7-week period. In addition, 20-Timers started about 2 modules (mean 1.98, SE 0.03) and completed 1 (mean 1.07, SE 0.02) during this time.
Graphical representations of the cluster solutions can be found in Figures S1 and S2 in.
|Variables||2-Timers (n=7681), mean (SE)||20-Timers (n=710), mean (SE)||F test (1, 8389)||P values|
|Age (years)||41.70 (13.67)||42.83 (13.33)||4.59||.03|
|Female (%)||1.66 (0.47)||1.67 (0.47)||0.08||.78|
|Logins (n)||2.25 (0.17)||21.82 (0.16)||13803.55||<.001|
|Trackings (n)||1.13 (0.14)||18.06 (0.14)||14057.60||<.001|
|Modules started (n)||0.40 (0.03)||1.98 (0.03)||3145.51||<.001|
|Modules completed (n)||0.03 (0.02)||1.07 (0.02)||3670.39||<.001|
This paper presents findings on individual usage behaviors for an Australian eMH tool, either as part of an RCT or as an open-access tool freely adopted by the general public. Exploratory findings reveal that the same number of usage groups emerged in both data sets: a large “lower-intensity” usage group and a smaller “higher-intensity” usage group. However, our findings revealed interesting differences between the 2 data sets that warrant consideration. First, the general community group tended to be older than the RCT group and overall used the eMH program significantly less frequently based on all usage metrics. Of further note, a considerable number of individuals registered to myCompass directly and were not part of the research trial. This could be because participation in a research study is more time intensive, as research volunteers are required to complete psychometric measures in addition to using the eMH tool. It is also possible that only a relatively smaller number of individuals were exposed to the research trial recruitment calls, while a greater number of interested individuals were able to discover the myCompass website using internet search engines.
Second, although the cluster analytical findings revealed 2 behavioral groups across both samples, the magnitude of usage was higher in the RCT sample for both usage groups. For example, the low-usage group in the RCT logged in an average of 9 times, while the low-usage group in the general community logged in only about twice on average. The high-usage group in the RCT sample was, again, not only higher in magnitude (about 35 logins on average as opposed to about 20 logins in the general community), but also proportionally bigger than that in the community sample. Specifically, 27.8% (44/158) of RCT users were identified as frequent users (30+-Timers), whereas only 8.46% (710/8391) of general public users were identified as such (20-Timers). Accordingly, the 30+-Timers RCT group completed the recommended 2 or more learning activities and came closest to the tracking recommendations of 49 logged trackings, whereas the 20-Timers community sample group clearly did not meet the tracking or learning activity recommendations. Thus, only the high-usage RCT group could be described as “adhering” to the learning activity recommendations, and no group adhered to the tracking recommendations.
Our findings add weight to Cavanagh’s  concerns about external validity in eMH trials, suggesting that inferences about real-world engagement from RCT data indeed should be made with caution. Although sample composition and usage patterns were comparable between the RCT and general community users, generalizing from the RCT sample would have overestimated the magnitude of real-world program engagement. One potential reason for this could be the differing motivation for eMH adoption. In our study, RCT participants seemed to be more motivated to use the eMH program and to use the core functionalities more consistently than were users in the general public. It is possible that, beyond the willingness to participate in mental health research, the aims stated in the Participant Information Statement inadvertently attracted individuals who were interested in the topic of eMH and therefore more motivated to engage with an eMH tool in general and with the activities recommended to them in particular. On the other side of the behavioral spectrum, we uncovered a unique set of individuals in the RCT population who were willing to participate in the RCT but did not proceed to register to the online mental health tool (ie, Nonstarters). These participants were not representative of the community population because they did not encounter the eMH tool at all.
Our study is unique in that we contrasted RCT and general public behavioral patterns for the same eMH program during the same time period, maximizing the comparability of users’ eMH experience while minimizing the influence of historical factors. However, some limitations of our analyses warrant consideration. First, our analytic techniques only allowed for a limited number of engagement variables, but many other variables, such as the number of tracking reminders a user sets, could have provided us with a more detailed picture of eMH engagement behavior. Second, we were only able to study those users who actually engaged with the core functionalities of the program repeatedly; therefore, our findings largely reflect individuals motivated to adopt an eMH program. Third, the data reported in this paper were collected between 2011 and 2012 and certainly would have made a timelier contribution then. Technological advances since this time include improvements in interface design and user experience features, such as chatbots, gamification, and virtual reality [, ]. However, these relatively more high-tech solutions have yet to be fully integrated into the digital mental health landscape [ ]. Thus, we believe that the general knowledge and discussion derived from this analysis, which inspected usage behaviors along common metrics such as logins, module usage, and mood monitoring, still bears relevance today. Fourth, the observed effect may be limited in scope. It is possible that the findings presented in this paper only apply to unguided eMH interventions. Usage patterns may differ for eMH programs that provide therapist assistance, which generally facilitates engagement [ ]. Last, we did not examine the significance of differences observed across samples. This short paper focused on eMH engagement rather than outcomes, as our goal for this study was to reignite a discussion of the real-world applicability of eMH engagement data derived from RCT findings.
In summary, our findings suggest that eMH engagement in RCTs likely matches the type of eMH engagement in real-world users, but may overestimate the magnitude of such engagement. This could be an important consideration for eMH researchers, designers, and policy makers, as they implement eMH tools after efficacy is established. We recommend that future eMH trials examine whether participant selection and per protocol instructions affect usage behavior, and if so, that due consideration be given to this in implementation planning. Similarly, those planning a wide-scale rollout of new eMH tools should consider which aspects of the original trial may help in promoting usage and whether similar methods can be used in the real world. Ecological validity in eMH engagement science is also relevant to theoretical investigations of how eMH tools improve individuals’ well-being. Mechanisms of change established in RCTs must be practicable in the real world for eMH to deliver on its promise of effective and scalable mental health care. Ultimately, the goal of improving eMH engagement science is to set realistic expectations of eMH benefits—both health and economic—and understand how to maximize these. The more accurately we can speak to eMH engagement, the more fruitful both eMH science and policy will be going forward.
Conflicts of Interest
Supplementary figures.DOCX File , 81 KB
- Kendall P, Comer J, Chow C. The randomized controlled trial: basics and beyond. In: Comer J, Kendall P, editors. The Oxford Handbook of Research Strategies for Clinical Psychology 9780199793655. New York: Oxford University Press, Incorporated; Apr 23, 2013:40-61.
- Rothwell PM. Factors that can affect the external validity of randomised controlled trials. PLoS Clin Trials 2006 May;1(1):e9 [FREE Full text] [CrossRef] [Medline]
- Baumel A, Kane JM. Examining predictors of real-world user engagement with self-guided eHealth interventions: analysis of mobile apps and websites using a novel dataset. J Med Internet Res 2018 Dec 14;20(12):e11491 [FREE Full text] [CrossRef] [Medline]
- Cavanagh K. Turn on, tune in and (don't) drop out: engagement, adherence, attrition, and alliance with internet-based interventions. In: Bennett-Levy J, Richards D, Farrand P, Christensen H, Griffiths K, Kavanaugh D, et al, editors. Oxford Guide to Low Intensity CBT Interventions (Oxford Guides to Cognitive Behavioural Therapy). New York: Oxford University Press; May 2010:227-233.
- Sieverink F, Kelders S, Poel M, van Gemert-Pijnen L. Opening the black box of electronic health: collecting, analyzing, and interpreting log data. JMIR Res Protoc 2017 Aug 07;6(8):e156 [FREE Full text] [CrossRef] [Medline]
- Torous J, Lipschitz J, Ng M, Firth J. Dropout rates in clinical trials of smartphone apps for depressive symptoms: A systematic review and meta-analysis. Journal of Affective Disorders 2020 Feb;263:413-419. [CrossRef] [Medline]
- Mohr DC, Weingardt KR, Reddy M, Schueller SM. Three problems with current digital mental health research . . . and three things we can do about them. Psychiatr Serv 2017 May 01;68(5):427-429. [CrossRef] [Medline]
- Ryan C, Bergin M, Wells JS. Theoretical perspectives of adherence to web-based interventions: a scoping review. Int J Behav Med 2018 Feb;25(1):17-29. [CrossRef] [Medline]
- Birk MV, Mandryk RL. Improving the efficacy of cognitive training for digital mental health interventions through avatar customization: crowdsourced quasi-experimental study. J Med Internet Res 2019 Jan 08;21(1):e10133 [FREE Full text] [CrossRef] [Medline]
- Holmes NA, van Agteren JE, Dorstyn DS. A systematic review of technology-assisted interventions for co-morbid depression and substance use. J Telemed Telecare 2019 Apr;25(3):131-141. [CrossRef] [Medline]
- Blackman KC, Zoellner J, Berrey LM, Alexander R, Fanning J, Hill JL, et al. Assessing the internal and external validity of mobile health physical activity promotion interventions: a systematic literature review using the RE-AIM framework. J Med Internet Res 2013 Oct 04;15(10):e224 [FREE Full text] [CrossRef] [Medline]
- Christensen H, Griffiths KM, Korten AE, Brittliffe K, Groves C. A comparison of changes in anxiety and depression symptoms of spontaneous users and trial participants of a cognitive behavior therapy website. J Med Internet Res 2004 Dec 22;6(4):e46 [FREE Full text] [CrossRef] [Medline]
- Proudfoot J, Clarke J, Birch M, Whitton AE, Parker G, Manicavasagar V, et al. Impact of a mobile phone and web program on symptom and functional outcomes for people with mild-to-moderate depression, anxiety and stress: a randomised controlled trial. BMC Psychiatry 2013;13:312 [FREE Full text] [CrossRef] [Medline]
- Lovibond SH, Lovibond PF. Manual for the Depression Anxiety Stress Scales (2nd Ed.). Sydney: Psychology Foundation; 1995.
- Gaffney H, Mansell W, Tai S. Conversational agents in the treatment of mental health problems: mixed-method systematic review. JMIR Ment Health 2019 Oct 18;6(10):e14166 [FREE Full text] [CrossRef] [Medline]
- Kaonga NN, Morgan J. Common themes and emerging trends for the use of technology to support mental health and psychosocial well-being in limited resource settings: A review of the literature. Psychiatry Res 2019 Nov;281:112594. [CrossRef] [Medline]
- Johansson R, Andersson G. Internet-based psychological treatments for depression. Expert Review of Neurotherapeutics 2012 Jul;12(7):861-870. [CrossRef] [Medline]
|eMH: e-mental health|
|RCT: randomized controlled trial|
Edited by G Eysenbach; submitted 20.02.20; peer-reviewed by K Mathiasen, B Lo; comments to author 23.06.20; revised version received 30.09.20; accepted 11.12.20; published 11.03.21Copyright
©Samineh Sanatkar, Peter Baldwin, Kit Huckvale, Helen Christensen, Samuel Harvey. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.03.2021.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.