Effects of Internet-Based Cognitive Behavioral Therapy in Routine Care for Adults in Treatment for Depression and Anxiety: Systematic Review and Meta-Analysis

Background: Although there is evidence for the efficacy of internet-based cognitive behavioral therapy (iCBT), the generalizability of results to routine care is limited. Objective: This study systematically reviews effectiveness studies of guided iCBT interventions for the treatment of depression or anxiety. Methods: The acceptability (uptake, participants’characteristics, adherence, and satisfaction), effectiveness, and negative effects (deterioration) of nonrandomized pre-post designs conducted under routine care conditions were synthesized using systematic review and meta-analytic approaches. Results: A total of 19 studies including 30 groups were included in the analysis. Despite high heterogeneity, individual effect sizes of investigated studies indicate clinically relevant changes, with effect sizes ranging from Hedges’ g=0.42-1.88, with a pooled effect of 1.78 for depression and 0.94 for anxiety studies. Uptake, participants’ characteristics, adherence, and satisfaction indicate a moderate to high acceptability of the interventions. The average deterioration across studies was 2.9%. Conclusions: This study provides evidence supporting the acceptability and effectiveness of guided iCBT for the treatment of depression and anxiety in routine care. Given the high heterogeneity between interventions and contexts, health care providers should select interventions that have been proven in randomized controlled clinical trials. The successful application of iCBT may be an effective way of increasing health care in multiple contexts. (J Med Internet Res 2020;22(8):e18100) doi: 10.2196/18100


Introduction
Depressive and anxiety disorders are common mental health problems associated with significant suffering, impairment, and reduction in the quality of life [1,2]. Both disorders lead to considerable socioeconomic costs through decreased work productivity and higher utilization of health care services [3,4].
Despite the proven effectiveness of psychotherapy in the treatment of depression and anxiety [5], the provision of evidence-based treatments depicts a constant challenge given the barriers such as the shortage of treatment, uneven distribution of trained providers, delayed treatment provision, and inadequacy of treatment [6,7]. Furthermore, research on patients' preferences has shown that many do neither make use of psychotherapeutic treatments nor do they receive psychopharmacological treatment [7]. Using the internet to provide psychotherapeutic interventions may increase the coverage of usual care services [8,9] by providing highly accessible and scalable interventions reaching people who cannot be reached otherwise. Recent research suggests that internet-based cognitive behavioral therapy (iCBT) with therapeutic guidance is effective for the prevention [10,11] and treatment [12][13][14][15] of common mental disorders. Systematic reviews on studies were also able to show comparable effects to face-to-face treatments in adults [16,17]. In a recent meta-analysis, Romijn et al [13] showed that iCBT interventions for anxiety disorders can also have significant effects obtained in trials implemented in clinical care. They also found that effects were smaller in samples recruited in clinical practice than in samples recruited with an open recruitment method compared with waitlist-control groups [13], which raises the question of the effects of iCBT when implemented in routine practice.
Although randomized controlled trials (RCTs) are considered the gold standard in exploring the efficacy of mental health interventions, the idealized and controlled nature of these trials limits the generalizability of findings to routine care populations [18]. RCTs maximize the internal validity, to ensure that the effect found can be attributed to the investigated intervention [19,20]. Thus, RCT findings are restricted by controlled protocols, explicit eligibility criteria, and patient recruitment and randomization procedures. RCTs provide a highly structured environment, which is considered to possibly have an adherence-fostering effect [21,22]. The efficacy derived from RCTs of internet-based interventions might be overestimated for what can be expected when implementing in routine care, limiting the knowledge base for routine clinical practice [20].
Hence, after establishing the efficacy of an intervention and its subsequent implementation, the so-called phase IV clinical trials should follow investigating benefits when implemented as well as potential negative effects implemented [23,24]. Thus, the investigation of the effectiveness of iCBT under routine care conditions is an important part of the evaluation of these services before wide-scale adoption.
Andersson and Hedman [25] reported on the effectiveness of iCBT within four controlled trials and eight open studies for a multitude of mental health problems, indicating that it might be possible to replicate the findings of controlled efficacy trials on guided iCBT in clinical practice. However, in that review, both routine care and RCTs were included, and only eight studies reported effects when the service was delivered under routine conditions. Recently, Andrews et al [15] reported the results of computer-based treatments of depression, panic disorder, generalized anxiety disorder, and social phobia in randomized trials. They also identified eight studies on internet-based treatments in routine clinical practice when delivered outside of a randomized trial reporting an average effect size of g=1.07 across all 4 disorders [15]. However, since then, many more studies have been published. In addition, this review did not specifically try to identify nonrandomized trials, possibly leading to unidentified articles. Additionally, they did not provide disorder-specific results, specific results on guided treatments by mixing guided and unguided treatments, and did not investigate the acceptability and potential negative effects.
The aim of this study was to examine the effects of guided iCBT for the treatment of depression and anxiety under routine care conditions on symptom change, acceptability (uptake, participants' characteristics, adherence, and satisfaction), and predictors of negative effects (deterioration and side effects).

Methods
We report this meta-analysis in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (Multimedia Appendix 1) [26]. This meta-analysis was registered at international prospective register of systematic reviews (PROSPERO; trial registration: CRD42018095704).
We searched PubMed, PsychINFO, and the Cochrane library. We used index terms and text words associated with depression and anxiety, internet interventions, and routine care (for a full search string, the reader is referred to Multimedia Appendix 2). Furthermore, we contacted experts in the field to ask whether they were aware of the studies that we did not identify through our systematic literature searches. Furthermore, we conducted reference tracking on the identified studies and previous meta-analyses in the field [5,14,15,27]. The resulting hits of our literature searches were screened on titles and abstracts by 2 independent reviewers (AE and CV). Studies considered as potentially relevant were screened on full text by the same reviewers independently. In case of disagreement, the opinion of a third senior reviewer (DE) was sought.

Inclusion Criteria
We included studies that (1) examined the effectiveness of a guided or blended iCBT in (2) treating adults with depressive and/or anxiety symptoms (3) under routine care conditions (4) in a pre-post design. We followed the inclusion of adults and older adolescents (aged >16 years) within the treatment provision for adults, as reported in the original studies.
We defined routine care studies as effectiveness studies, which were conducted as nonrandomized clinical trials in settings equal to or representative of routine practice [28]. The definition of routine care differs between countries and health care systems and describes the established way of working at the time of the original study. Depression and anxiety symptoms had to be established based on cutoff scores on self-report outcome measures, clinical diagnosis, or expert opinion. The definition of anxiety symptoms is based on the Diagnostic and Statistical Manual of Mental Disorders IV classification criteria for anxiety disorders. Furthermore, the interventions were considered as guided when the guidance was related to the therapeutic content [29] and as blended when the internet-based intervention was combined with face-to-face elements in one integrated standardized treatment protocol [27]. Guidance can be delivered via email, a secure message system, telephone, or face-to-face contact and via video or face-to-face contact in blended treatments. Finally, both disorder-specific and transdiagnostic interventions (targeted at both depression and anxiety simultaneously) were included.

Exclusion Criteria
We excluded studies that did not (1) focus primarily on anxiety or depression or (2) provide sufficient data for the calculation of the effect sizes. Studies were also excluded if (1) the service had only been provided as part of a research study, (2) the study could be considered as a feasibility or pilot trial, or (3) patients were randomized at an individual level. However, cluster randomized trials were considered eligible, in which randomization took place not on an individual level but, for example, on a health care institution level. For the definition of feasibility and pilot trials, we followed the NIHR Evaluation, Trials and Studies Coordinating Centre definition of pilot and feasibility trials [30], as recommended by Arain et al [31]. Feasibility trials were defined as "pieces of research done before a main study" (designed around the research question "Can this be done?"), and pilot studies are defined as a version of the main study that is "run in miniature to test whether the components of the main study can all work together" [31]. Additionally, we only included studies published in English, German, or Dutch language.

Data Extraction
We extracted data related to study and iCBT service-related characteristics, acceptability, effects on symptom change, negative effects, and data related to the risk of bias of reported results.
Study characteristics included the year of publication, the country in which the study was conducted, the year of data collection, sample size, eligibility criteria (establishment of depression and/or anxiety diagnosis at baseline [standardized clinical interview, cutoff on standardized questionnaire, and clinical judgment], inclusion of severe cases [yes/no], and exclusion of cases with suicidal ideation [yes/no], and approach to data analysis [ITT/completer]).
iCBT service-related characteristics included intervention name, the symptoms targeted (depression and/or anxiety), if it was a blended treatment (yes/no), evidence base for the used intervention (positive results based on at least one randomized clinical trial [yes/no]), and whether it was a symptom-specific or transdiagnostic treatment. We also included the recruitment pathway (open community, clinical referral, and both), the number of planned intervention modules, guidance focus (content-focused, motivational-focused, and , training of professionals in iCBT (yes/no), supervision of professionals by a trained clinician (yes/no), the planned and actual intensity of guidance in minutes, and if there was a guidance manual provided (yes/no). Additional information on whether a standardized procedure in the case of symptom deterioration and crisis (yes/no) has been established was included.
Acceptability data were extracted with regard to uptake (the number of people screened for the service, people included, and participants starting the treatment), patient characteristics (age and gender), average symptom severity at baseline, adherence (ie, number of completed modules), mean treatment duration in weeks, and participant satisfaction. Negative effects were extracted with regard to average effects on symptom deterioration, other side effects, and reports of specific subgroups at risk for symptom deterioration.
Two reviewers (AE and CV) extracted the data independently, and data sets were merged. Differences and points of uncertainty were discussed and checked by returning to the original article and in some cases to the authors of the respective article.

Risk of Bias Assessment
Assessing the quality of naturalistic observational studies is challenging as there is no widely accepted tool in doing so [32]. Moreover, established guidelines for the quality assessment of nonrandomized trials are only partially applicable, as they assume comparisons of interventions (Risk Of Bias In Nonrandomized Studies of Interventions-I [33]). Thus, in this study, we selected and adapted criteria from two quality assessment tools [33,34] and adapted them to this study's purposes to evaluate the risk of bias of the included studies. For the present risk of bias assessment, we discussed the aforementioned assessment tools among all coauthors of this manuscript and derived the analysis criteria described in Multimedia Appendix 3 [35]. As a result, we evaluated (1) researcher allegiance (defined as the first or last author of the study also being the first or last author of the intervention development or efficacy paper), (2) confounding introduced by patients' participation in other treatments, (3) confounding introduced by significant confounding variables identified within the individual study (meaning any predictors included such as age, guidance, or recruitment pathway), (4) selection bias introduced by the study population (ie, have the studies only reported on completer data), and (5) selective outcome reporting in comparison with the study protocol or diagnostic measures administered as mentioned in the original studies' methods section. A description of the risk of bias assessment and its operationalization can be found in Multimedia Appendix 3.
With regard to Researcher Allegiance, we chose the above definition after consideration among the authors and evaluated a study as at high risk of researcher allegiance when the first or last author of the study was also involved in the development of the treatment manual of the psychotherapy involved or the reporting on the interventions' efficacy. Although the validity of other indicators has been questioned, the involvement of a researcher in developing the treatment under investigation can be considered a valid indicator of potential researcher allegiance [36].
Two reviewers evaluated the quality of the included studies independently (AE and CV). Any disagreement between reviewers was solved by a thorough discussion. If the disagreement could not be resolved, a third senior reviewer was consulted (DE).

Statistical Analysis
Our primary outcome was the reduction of depressive or anxiety symptoms from pre-to posttest assessment. We calculated the difference in depression and anxiety symptoms between pre-post assessment divided by the weighted, pooled standard deviation (Hedges' g). We have chosen Hedges' g because it allows for small sample size bias correction [37]. As we expected considerable heterogeneity among the studies, we used the random effects model. As a rule of thumb, effect sizes of 0.8 can be viewed as large, 0.5 as moderate, and 0.2 as small [38]. In our main analysis, we included mixed depression and/or anxiety studies into the separate depression and anxiety data sets. Statistical analysis was conducted using the Comprehensive Meta-analysis program (version 2.2.2), and pooled proportions were calculated with R [39] package meta [40].
To calculate heterogeneity, we used the I 2 -statistic and its 95% CIs as an indicator of heterogeneity in percentages. Heterogeneity was interpreted as low, moderate, and high when 25%, 50%, and 75%, respectively.
We also included the correlations of the used pre-post measures using the mean of 0.76, where none was provided for depression and 0.59 for anxiety following the study by Balk et al [41]. We also conducted sensitivity analyses for correlations set to 0.00, 0.75, and 0.99 to examine the robustness of our findings [42]. We also calculated the prediction interval, which estimates where the true effects are to be expected for 95% of similar studies that might be conducted in the future [43].
As we expected high heterogeneity, we conducted several subgroup analyses to investigate its possible sources. The examined subgroups were related to the method of analysis, time to post assessment, recruitment pathways, disorders, guidance moment (specific timing or as a reaction), guidance modality (email, message, and synchronous), guide profession (with or without specific CBT training), supervision provided (yes/no), guide training provided (yes/no), intervention manual provided (yes/no), approach to data analysis (ITT/completer), and diagnostic method (interview/questionnaire). Subgroup analyses were only carried out with regard to the effects on symptom change. We used the mixed effects model, testing pooled studies within subgroups with random effects models while testing for significant differences between those subgroups with fixed effects models. We only conducted subgroup analysis if the number of studies per category was not less than three. If necessary, we combined predefined subgroups to achieve the necessary group size.
Finally, we conducted meta-regression analyses for the continuous variables, examining the duration of the treatment as a predictor of treatment outcome as well as guidance time, number of contacts, number of sessions completed, and the percentage of treatment completers.
Regarding uptake, we calculated the proportion of (1) included people based on the number of people screened, (2) starters based on the number of people being screened, and (3) starters based on the number of people included. Adherence was analyzed by calculating the percentage of modules completed based on the average number of sessions that were completed by the participants divided by the planned total number of sessions. We also coded the percentage of intervention completers for a 100% completion rate. Additionally, we pooled the age and gender distribution as well as participant satisfaction extracted from original studies. Furthermore, we pooled the percentage of individuals reported to show symptom deterioration (defined as a negative reliable change in the reported outcome), the deterioration rates, reported in the original study.
Publication bias was examined by inspecting the funnel plot [44] and conducting the Egger test of the intercept with a one-tailed significance level of α=.05 [45]. In addition, we used Duval and Tweedie's trim and fill procedure [46] to adjust the effect size for missing studies. The study selection process is illustrated in Figure 1.   i Participants were initially identified as suitable to receive a low-intensity intervention for depression or low mood through the triage of a patient's self-assessment form by team leaders, all of whom were qualified CBT therapists. Patients then had an initial assessment with a psychological well-being practitioner who considered a person's suitability for MindBalance in reference to the patient's identified difficulties, goals, and the studies' inclusion and exclusion criteria (inclusion: to receive treatment of depression with little or no comorbid anxiety, appropriate for guided self-help in a primary-care setting as determined by current [ Table 2.

Risk of Bias of the Included Studies
The quality of the included studies varied. Of the studies, 67.0% (k=20/30) were rated with a high risk of bias on Researcher Allegiance. Of the studies, 63.0% (k=19/30) did not exclude patients who were participating in other psychotherapeutic treatments (Treatment Inclusion Confounding), and none of the studies reported on the adjustment for confounders in the data analysis. Intention-to-treat data could be extracted from 73.3% of the studies (k=22/30), and none of the studies were preceded by a published study protocol. The risk of bias assessment is depicted in Figure 2. iCBT Service Acceptability Acceptability data on uptake, participant characteristics across studies, adherence, and participant satisfaction were pooled. All acceptability results are depicted in Multimedia Appendices 6 and 7. The pooled results are presented in Table 2.

Participant Characteristics
The pooled percentage of female participants was 65

Participant Satisfaction
Of the 17 studies, 10 (58.8%) reported participants' satisfaction. Participant satisfaction outcomes were reported inconsistently, using varying measures and different reporting forms. Therefore, these data could not be pooled, but the detailed results and the data extracted on patient satisfaction are depicted in Multimedia Appendix 7 [81,[121][122][123]. Within the studies reporting participants' satisfaction, five studies reported a high and four a very high participants' satisfaction.

Depression
Effect sizes for changes in depression severity ranged from 0.66 to 1.88 (Hedges' g, k=13 studies), with 1 study (7.7%) reporting a moderate and 12 (92.3%) a large effect size.
The average pre-post effect size of all depression treatments was g=1.18 (95% CI 1.06-1.29), which can be considered a large effect. Heterogeneity was significant and high (I 2 =95%; 95% CI 94-97; P<.001). The prediction interval is 0.74-1.62, and we can expect that in 95% of all populations, the true effect size will fall within this range.
The details of these results are shown in Figure 3 and Table 3.   [47][48][49] as well as depression study by Ruwaard et al [50].
b Two excluded studies [51,52] as well as posttraumatic stress disorder (PTSD) and panic disorder studies by Ruwaard et al [50] and PTSD study by Titov et al [53]. c OCD: obsessive-compulsive disorder. In this analysis, the pre-post measurement correlation was set to the actual pre-post correlation of the measure (between 0.36 and 0.78). Sensitivity analysis, with correlations set to 0, 0.75, and 0.99, resulted in comparable effect sizes (g Corr=0 =1.24, Both the visual inspection of the funnel plot and Egger test (P=.90) did not indicate a potential publication bias.
We found five studies to be outliers, defined as not overlapping with the 95% CI of the pooled estimate. Removing these studies [47][48][49], and the depression group in the study by Ruwaard et al [50], from the analysis did not result in meaningful changes in effect sizes (g=1.18, 95% CI 1.09-1. 26
The average pre-post effect size (Hedges' g) of all anxiety interventions, including the interventions that targeted both anxiety and depression, was g=0.94 (95% CI 0.83-1.06), which is considered a large effect. Heterogeneity was high (I 2 =89, 95% CI 84-92; P<.001). The prediction interval is 0.44-1.44, and we can expect that in 95% of all populations, the true effect size will fall within this range. The details of these results are shown in Figure 4 and Table 3.
In the main analysis described above, the pre-post measurement correlation was set to 0.59. Sensitivity analysis with correlations set to 0, 0.75, and 0. 99   Both the visual inspection of the funnel plot and Egger test (P=.91) did not indicate a potential publication bias.
We found five studies to be outliers, as their results did not overlap with the 95% CI of the pooled estimate. Removing studies [51,52] as well as PTSD and panic disorder studies by Ruwaard et al [50] and PTSD study by Titov et al [53] from the analysis did not influence the result significantly (g=0. 90 Tables 4 and 5 show the results of all examined subgroup analyses. Significant differences between subgroups were found for professional training of coaches, supervision of coaches, and treatment duration for both depression and anxiety studies and for recruitment pathways for depression studies only. Studies evaluating a period of 9 to 13 weeks of treatment duration reported a significant lower effect size (depression: g=1.00, 95% CI 0. 95  Test against "Guidance format: face-to-face vs written guidance," "Guidance modality: Message, Email, Telephone, F2F," and "Guide profession" excluded, as there were too few studies included in analysis. b Number of studies. c Only two studies included via the clinical pathway only. We combined the categories "Both, community and clinical" and "clinical" for this analysis. d Excluding one study [54], as this is the only study using clinical judgment without specifying the use of an interview or questionnaire. e We grouped all studies involving guides not specifically trained in delivering cognitive behavioral therapy in the category "non-professional" and studies involving psychiatrists, psychologists, or psychotherapists in their guidance in the category "other."  c We grouped all studies involving guides not specifically trained in delivering cognitive behavioral therapy in the category "non-professional," and studies involving psychiatrists, psychologists, or psychotherapists in their guidance in the category "other."

Subgroup Analysis for iCBT for the Treatment of Depression, Anxiety, or Mixed Depression and/or Anxiety
Depression studies that recruited in community settings only reported significantly higher effect sizes (g=1.37, 95% CI 1.16-1.59; I 2 =96; 95% CI 94-98), compared with studies that recruited in clinical or clinical and community settings (g=1.05, 95% CI 0.95-1.14; I 2 =78; 95% CI . Across all recruitment pathways, effect sizes were large, but heterogeneity remained high. We did not find this difference in anxiety studies.
Depression studies reporting to having provided supervision to their coaches, trained their professionals, and provided an intervention manual reported a significantly higher effect size (g Supervision =1.27, 95% CI 1. 13 There were no differences between subgroups regarding all other examined subgroups, both for depression and anxiety studies.
Subgroup analyses comparing studies rated with high versus low risk indicated that Researcher Allegiance did not have a significant influence on the estimated effect sizes for neither anxiety nor depression studies. The heterogeneity within the studies reporting a low risk of bias on Researcher Allegiance did reveal an I 2 of 39 compared with an I 2 of 90 for studies reporting a high risk of bias. Moreover, anxiety studies rated as at high risk of Treatment Inclusion Confounding had higher estimated effect sizes. This was not replicated in subgroup analyses of interventions targeting depression. Anxiety studies at high risk of Selection Bias reported significantly lower effect sizes. Similar outcomes were not replicated in the depression trials.

Meta-Regression Analysis for iCBT for the Treatment of Anxiety, Depression, or Mixed Depression and Anxiety
Meta-regression analyses indicated that longer treatment duration in depression studies was positively associated with a higher effect (P=.02; β=0.03, R 2 =0.00). This effect was not found in anxiety studies (P=.94). None of the examined variables, that is, guidance time, number of contacts, number of sessions completed, or the percentage of treatment completers, were significantly associated with the observed effect sizes, neither in depression nor anxiety studies.

Discussion
This study aims to examine the acceptability, effects on symptom change, and negative effects of guided iCBT interventions in treating depression and anxiety in routine care. Regarding the uptake of the service, on average, 70.2% of people screened were not offered inclusion, and of those included, 73.0% started the intervention. The vast majority of participants reached were female, with an average age of 38.3 years, and 61.3% of participants completed the interventions as planned. Reported participant satisfaction was high, although inconsistently reported results did not allow us to pool effects. The average professional guidance time per participant was 133.49 min over the treatment duration. With regard to the effects on symptom change, the results indicated large average reductions for both depression (g=1.18; 95% CI 1.06-1.29) and anxiety (g=0.94; 95% CI 0.83-1.062). However, the heterogeneity between studies was high. Nevertheless, all examined effect sizes were at least moderate, indicating the intervention's potential when delivered under routine care conditions with effects ranging from moderate to large. The average deterioration rates were 3.2% for depression and 3.1% for anxiety. Subgroup analyses indicated a range of iCBT service-related characteristics to be associated with the observed treatment effects.
Regarding uptake, we found that many participants who were in contact with the iCBT service did not start the intervention. Pretreatment dropout is hard to assess, and, accordingly, reasons for not starting an iCBT intervention after inclusion have not been discussed in the original publications.
The average age of participants found in this study (mean 38.30) appears to be slightly lower than that reported in RCTs on guided iCBT interventions for the treatment of depression (mean 42.5 [124]) but comparable with reports on the mean age of participants within guided iCBT interventions for the treatment of anxiety [125]. The percentage of females in the routine care study population was higher for depression studies compared with guided iCBT for the treatment of depression [124] and similar to reports on participants in guided iCBT interventions for the treatment of anxiety [125] in experimental settings. As similar distributions between female and male users are reported in face-to-face mental health service utilization [126], this effect might be explained by gender differences in help-seeking behavior than being related to iCBT service-related factors [127] as well as by gender differences in the prevalence of depression and anxiety disorder [128,129]. Future studies should focus on ways to attract men to use iCBT interventions.
The pooled reported percentage of sessions completed, that is, 62.6% in depression and 57.3% in anxiety studies, was lower than that described in meta-analyses on adherence in RCTs on iCBT interventions. Comparing the adherence to iCBT and face-to-face CBT, van Ballegooijen et al [130] reported that on average, participants completed 80.8% of treatment sessions in the iCBT and 83.9% in the face-to-face intervention [130]. Similarly, the percentage of participants completing the treatment as planned was lower (62.8% for depression and 61.7% for anxiety studies) than reported elsewhere [130,131]. These differences might be due to the assumed adherence-fostering effect of randomized controlled settings versus routine care [132]. However, completion rates were reported inconsistently across studies, applying different criteria such as study or treatment completers, including several definitions of treatment completions. To facilitate comparability, literature on iCBT completion should settle on one reporting standard. Further investigation of factors promoting the acceptance of iCBT interventions, also when reporting on effectiveness results in routine care, may lead to a deeper understanding that might foster intervention development and upscaling.
Results on the effectiveness of iCBT (g Depression =1.18, 95% CI 1.06-1.29 and g Anxiety =0.94, 95% CI 0.83-1.062) confirm findings of recently published systematic reviews and meta-analyses on RCTs of iCBT for depression and anxiety. Königbauer et al [12] found medium to large pre-post within-group effects ranging between −0.64 and −2.24 for interventions treating clinical depression [12]. To our knowledge, no recent meta-analysis has reported on pre-post effect sizes of studies targeting guided iCBT interventions for the treatment of anxiety. On an individual study level, pre-post effects in randomized trials ranged from 0.54 to 2.40 (please see Multimedia Appendix 8 for references)  compared with 0.66 to 1.88 in depression and 0.42 to 1.38 (Hedges' g) in anxiety within this analysis.
With regard to randomized pragmatic trials conducted under routine care conditions, Andrews et al [15] examined a sample of 64 papers reporting results of RCTs on the effectiveness of iCBT for the treatment of depression, panic disorder, generalized anxiety disorder, and social phobia in comparison with control groups in routine practice. This review study reported effect sizes for depression, panic disorder, generalized anxiety disorder, and social phobia ranging from g=0.67 to 1.31 [15]. The same study identified eight papers investigating the effectiveness of iCBT, reporting an average effect size of g=1.07 across the treatment of depression, panic disorder, generalized anxiety disorder, and social phobia [15]. The between-group effects were moderate to large (g=0.72; 95% CI 0.60-0.83; P<.001; of I²=53, 95% CI  in the most recent meta-analysis of iCBT treatments for anxiety compared with control conditions in reducing symptoms of anxiety in an adult population [13]. Additionally, the results of this study are in line with meta-analytic findings on face-to-face CBT treatments implemented in routine care with pre-post effect size found in randomized trials ranging from d=0.69 to 2.28 for depression [28] and g=0.73 to 2.59 for anxiety treatments [163]. The results of deterioration rates (3.2% in depression and 3.1% in anxiety studies) were slightly lower, but within the 95% CI of findings based on RCTs for internet-based guided self-help interventions (3.36%) for depression [164] and anxiety (5.8% [165]), and also comparable with deterioration rates in face-to-face psychotherapy for depression [166]. Criteria defining deterioration varied between studies, and unfortunately, neither were reports on other negative effects included in most primary studies nor reported any study predictors of deterioration. This seems of utmost importance to identify those individuals that should potentially be referred to other mental health services. Their investigation is of specific importance within naturalistic study designs and under routine care conditions [164,165,167].
Most evaluated iCBT services for depression (69.2%) excluded severe cases and individuals with suicidal ideation (k=9/13) at baseline. However, a large-scale study showed that iCBT services can also result in positive effects on suicidality, reducing the prevalence of suicidal ideation from 50% at baseline to 27% after treatment [168]. In addition, a recent individual patient data meta-analysis on RCTs indicated that guided iCBT also resulted in clinically meaningful results in individuals with severe depression symptomatology [124]. Given that many individuals applying to iCBT services either do not have access to other immediate care or are not willing to utilize alternative treatment services, future studies should explore the balance between potential risk and benefits of opening up those services to populations showing elevated suicidal ideation. In such cases, it seems of utmost importance to monitor potential upcoming crises using standard operating procedures involving trained clinicians and to evaluate treatment success at the end of the service. In case of nonresponse, individuals should be motivated and guided to utilize other mental health care services, if available. Such standardized crisis procedures were only reported to be employed by less than half of the studies included in this review. iCBT services in routine care might profit from clear pathways of referral to other services in cases of nonresponse and symptom deterioration. Furthermore, future research should facilitate our understanding of the effects of routine outcome monitoring in routinely applied iCBT [169], as this monitoring could help evaluate participants' progress throughout the course of treatment, using standardized outcome measures to elicit clients as part of a measurement-based care delivery approach in routine mental and behavioral health care [170,171].
The finding that treatment outcomes of depression interventions were greater when recruitment was carried out using an open recruitment strategy in a community setting compared with when recruited in a clinical setting is in line with the findings of Romijn et al [13] with regard to randomized pragmatic studies on anxiety disorder treatments. However, in our study, this interaction was only found for depression and could not be confirmed for anxiety disorders. One potential explanation for the difference in effects might be differences in the characteristics of the included patients. There is evidence that iCBT recruiting via open recruitment strategies, such as through web-based channels, might only reach a specific population that is different from those seeking help in a clinical setting [19]. It is often argued that internet interventions might reach individuals that would otherwise not seek treatment or only at a later time point. Given that, for example, the chronicity of depression is associated with worse treatment outcomes [172], the difference in effect might be explained by reaching a population with lower chronicity. However, such an assumption needs to be confirmed in future studies.
Further subgroup analyses indicated that iCBT services for the treatment of depression utilize trained professionals (psychotherapists and psychiatrists) to result in larger pre-post changes compared with iCBT services that used only nonprofessionals not trained in CBT (psychologists without specialized CBT training, nurses, GPs, counselors, coaches, and lived experience coordinators). However, we did not find this effect in the anxiety studies. Moreover, effects in the subgroup of depression studies involving nonprofessionals were large, indicating the potential to deliver iCBT services, for example, in contexts when there might be a shortage of trained clinicians. In cases where nonprofessionals deliver guidance in iCBT services, supervision by trained clinicians, including the availability of professionals for crisis intervention, seems warranted. Further subgroup analyses also indicated that providing supervision to coaches is also associated with higher average treatment effects for depression, but not for anxiety studies. Furthermore, training the professional and providing an intervention manual is positively related to the interventions' effectiveness. This result must be interpreted with caution as we coded all studies not mentioning supervision, training, or manual provision in their publication as not providing these components. Furthermore, these components do not inform us about actual treatment fidelity. Further research should focus on the effects on treatment outcomes of providing supervision, training, and intervention manuals to professionals working with iCBT interventions in routine care as well as the assessment of treatment fidelity.
Moreover, we did not find a difference in effects on mean symptom change between iCBT services who applied diagnostic interviews for patient allocation versus those that used self-reports only. This is in line with meta-analytic findings from RCTs on guided digital interventions for depression [124] and with studies directly comparing the effectiveness of iCBT services when treatment allocation was based on an automatic web-based assessment versus clinician assessment [173]. This indicates that such services can be used in contexts when implementing services with initial clinician assessment is not possible, without affecting average treatment success. However, it must be noted that although results might not indicate differences on the group level, it might be the case that using web-based assessments only, without a clinical assessment, will overlook relevant diagnostic information that requires immediate attention, such as suicidal risk or an underlying treatment need for comorbid disorders such as PTSD on an individual level.
The strengths of this meta-analysis include the exclusive focus on evaluating iCBT interventions for their acceptability and clinical outcomes under real-world conditions. Unlike previous systematic reviews that mixed efficacy with effectiveness trials, in this review, we focused only on studies conducted in regular care settings. This is important as we strive to report routine care results free from biases possibly being introduced within efficacy studies such as stricter application of protocolized procedures, eligibility criteria, and randomization [19][20][21][22]. Moreover, we presented an overview of implementation indicators existing in the included studies that can be used to gain a better understanding of how iCBT can be adopted by regular care services. Nevertheless, the findings of this study should be interpreted with caution due to several limitations.
First, the heterogeneity in our sample was high and significant, illustrating a great variation in the results of the included studies. Thus, we cannot draw firm conclusions regarding the average effect of iCBT in routine care. Moreover, within-group effect sizes do not depict an optimal estimator for the treatment effect because they are not independent of each other and do not account for recovery occurring independent of the treatment, thereby leading to an overestimation of the treatment effect [42]. However, in comparison with and on the basis of the reported efficacy of iCBT interventions established in RCTs, they depict the best available indicator of the effects of iCBT solutions in a routine care environment. Furthermore, we found that treatment duration had a significant influence on treatment effects. This result also supports the hypothesis that findings on pre-post changes in symptom severity might have been influenced by spontaneous or unexplained recovery, which is a common factor in depression [174]. However, our main results are in line with within-group effect sizes found in RCTs, where spontaneous recovery also occurs, and we, therefore, conclude that our effects can be considered substantial. Although heterogeneity was not explained by any other of the examined subgroups, several assumptions can be made regarding its sources. One other explanation for the high heterogeneity might be the influence of contextual factors of observational studies, such as sampling methods, participant characteristics, within-group effect sizes, and differences between the studies in reporting outcomes. It can be hypothesized that a greater harmonization regarding the conduct and reporting of effectiveness studies in routine care could lead to greater comparability of the studies' results. Another reason for the observed heterogeneity might be the different contexts of regular care facilities across different countries. There is great variability in the degree of e-mental health penetration in different countries. For instance, Australia is considered one of the frontrunners in the e-mental health field, whereas Norway adopted these interventions very recently [175]. Thus, professionals might differ in the way they interact with e-mental health around the world. Finally, the interventions might differ in the way that they have been developed. These results also imply the importance of establishing a firm evidence base for individual iCBT interventions before their larger upscale.
Second, firm conclusions on treatment effects might be biased by studies that also included participants who could also participate in other psychotherapeutic treatments. Meanwhile, the data do not allow conclusions on the percentage of participants receiving additional treatment and represents the routine practice. Additionally, no study has reported adjusting for confounders such as baseline symptom severity, treatment fidelity (provision and use), or changes in the treatment over the course of the studies, which should be considered in future reports on the effects of iCBT in routine care.
Future studies should add to the body of literature on iCBT interventions examined under routine care conditions. Additionally, these studies should not solely focus on the effectiveness of the interventions, but if possible, it would be helpful if they also reported on specific service-, implementation-, and context-related outcomes. One way of achieving this might be through taxonomy and guidelines for the reporting of iCBT effectiveness, implementation, and context outcomes in routine care. In contrast to standards of reporting RCTs, no such international standards exist when it comes to reporting nonrandomized intervention studies. Proctor et al [176] suggested a list of outcomes for implementation-related research, and Hermes et al [177] recently made suggestions on how to build upon these ideas to establish a measurement system for the implementation of behavioral intervention technologies. Moreover, such research should always be discussed and evaluated in the light of the quality criteria established to help all involved stakeholders, patients, practitioners, and decision makers at the local and policy level to identify not only effective but also safe interventions [178].
In conclusion, this study provides further evidence supporting the acceptability and effectiveness of guided iCBT for the treatment of depression and anxiety when implemented in routine care, whereas results on negative effects are less clear. Guided iCBT may be an effective way of overcoming barriers to treatment provision. It may substantially increase the coverage of usual care services and offer an innovative treatment format for the treatment of depression and anxiety.