Mobile Apps to Reduce Tobacco, Alcohol, and Illicit Drug Use: Systematic Review of the First Decade

Background: Mobile apps for problematic substance use have the potential to bypass common barriers to treatment seeking. Ten years following the release of the first app targeting problematic tobacco, alcohol, and illicit drug use, their effectiveness, use, and acceptability remains unclear. Objective: This study aims to conduct a systematic literature review of trials evaluating mobile app interventions for problematic tobacco, alcohol, and illicit drug use. Methods: The review was conducted according to recommended guidelines. Relevant databases were searched, and articles were included if the mobile app study was a controlled intervention trial and reported alcohol, tobacco, or illicit drug consumption as outcomes. Results: A total of 20 studies met eligibility criteria across a range of substances: alcohol (n=11), tobacco (n=6), alcohol and tobacco (n=1), illicit drugs (n=1), and illicit drugs and alcohol (n=1). Samples included the general community, university students, and clinical patients. The analyzed intervention sample sizes ranged from 22 to 14,228, and content was considerably diverse, from simple stand-alone apps delivering self-monitoring or psychoeducation to multicomponent apps with interactive features and audio content, or used as adjuncts alongside face-to-face treatment. Intervention duration ranged from 1 to 35 weeks, with notifications ranging from none to multiple times per day. A total of 6 of the 20 app interventions reported significant reductions in substance use at post or follow-up compared with a comparison condition, with small to moderate effect sizes. Furthermore, two other app interventions reported significant reductions during the intervention but not at post treatment, and a third reported a significant interaction of two app intervention components. Conclusions: Although most app interventions were associated with reductions in problematic substance use, less than one-third were significantly better than the comparison conditions at post treatment. A total of 5 out of the 6 apps that reported intervention effects targeted alcohol (of those, one targeted alcohol and illicit drugs and another alcohol and tobacco) and 1 targeted tobacco. Moreover, 3 out of 6 apps included feedback (eg, personalized) and 2 had high risk of bias, 1 some risk, and 3 low risk. All 6 apps included interventions of 6 weeks or longer. Common study limitations were small sample sizes; risk of bias; lack of relevant details; and, in some cases, poorly balanced comparison conditions. Appropriately powered trials are required to understand which app interventions are most effective, length of engagement required, and subgroups most likely to benefit. In sum, evidence to date for the effectiveness of apps targeting problematic substance use is not compelling, although the heterogeneous comparison J Med Internet Res 2020 | vol. 22 | iss. 11 | e17156 | p. 1 http://www.jmir.org/2020/11/e17156/ (page number not for citation purposes) Staiger et al JOURNAL OF MEDICAL INTERNET RESEARCH


Introduction
The problematic use of substances such as alcohol, tobacco, and illicit drugs is one of the leading causes of morbidity and mortality worldwide [1]. Despite the devastating health and social consequences, a large proportion of individuals who engage in problematic substance use do not seek formal treatment [2][3][4]. Help-seeking barriers include concern about anonymity, not knowing about or being able to access treatment, and the financial or time burdens of treatment [5][6][7][8]. Hence, interventions that can address some of these help-seeking barriers warrant attention to reduce the substantial negative impact of substances at a population level.
Mobile health (mHealth) interventions purport to overcome many of these help-seeking barriers by offering a population-based approach, improving access and affordability [9,10]. mHealth refers to health support delivered on mobile devices, such as cell phones, smartphones, and tablets [11], typically using dedicated apps, but also includes systems such as interactive voice response (IVR) and text messaging (SMS) [12,13]. Apps have rapidly become the most popular software method for delivering health support (ie, mHealth) on mobile devices. As of August 2019, a search of iTunes and Google Play indicated that over 45,000 mHealth apps are currently available. Surprisingly, despite the plethora of apps available to assist people in reducing problematic alcohol and other drug use, only a very small proportion of these apps are evidence based [14]. Evaluations of other health-related apps, for example in the field of mental health, have produced positive [15] or negligible [16] changes in the targeted behavior. Importantly, a number of early trials of apps that focused on problematic substance use have produced promising results, suggesting that apps could play a role in assisting individuals who are dependent on tobacco, alcohol, or illicit drugs to quit or maintain abstinence [17].
It has been approximately 10 years since the emergence of mobile apps designed to help people reduce or recover from problematic alcohol, tobacco, and/or illicit drug use [18]; 6 years since controlled trials have appeared [17]; and 5 years since the first publication of a systematic literature review of 6 smartphone apps for problematic substance use [19]. Of note, all but 1 review has examined the literature on digital interventions more broadly, and all reviews have only included alcohol interventions within their review. For example, Kaner et al [20] found that digital interventions (ie, delivered via a computer, smartphone, or mobile device) for alcohol use showed they significantly lowered alcohol consumption, with an average reduction of up to 3 (United Kingdom) standard drinks per week compared with control participants. Similarly, a meta-analysis showed that internet-delivered alcohol interventions significantly reduced problematic drinking behavior among adults, reducing by 5 standard units of alcohol consumption each week compared with the control group [21]. Berman et al [22] examined the use of mobile interventions (IVR, SMS, and apps) to reduce drinking in university students. A total of 2 of the 7 reviewed studies used apps, and they found that only an IVR intervention resulted in a reduction in the primary outcome. Finally, the most recent related review focused on alcohol use in community participants and similarly included a range of mobile interventions beyond apps [23]. Moreover, 5 of the 19 studies in their review [23] included app-based interventions with mixed findings reported. To date, there has been only 1 app-specific systematic review, which included pilot studies and open trials [19], which evaluated the alcohol app studies across community and alcohol-dependent individuals. The authors found that 2 of the 6 mobile apps reviewed reported reliable positive outcomes, with a further 2 showing promise. The authors highlighted the limited number of studies, small sample sizes, lack of control groups, and limited rigorous designs within the field of mobile app interventions for problematic substance use. Since their review, which was conducted in 2015, there has been a three-fold increase in controlled evaluations of mobile apps designed to reduce substance use or aid recovery from substance dependence. Although some reviews have included apps when examining the effectiveness of alcohol interventions delivered via mobile devices [20,22,23], surprisingly, there has been no further synthesis of the evidence regarding the effectiveness of problematic alcohol use interventions delivered specifically via mobile apps. Moreover, none of the app-specific reviews included smoking and illicit drug use to develop a comprehensive picture of the effectiveness of apps across the substance use field. Thus, this paper reports on the current evidence base regarding the effectiveness and feasibility of mobile apps designed to reduce problematic alcohol, tobacco, and illicit drug use. In addition, we report usability, adherence, retention, and engagement data where possible. This information is critical to gain a deeper understanding of user experience and behavior alongside effectiveness data. Thus, it has been approximately 10 years since the first appearance of an app targeting the reduction of substances, and it is timely for us to review the progress of the field.

Aims and Guidelines
indicated that there was a wide range of mobile delivery methods, considerable variability in intervention content, and control groups; hence, we decided a priori to not conduct a traditional meta-analytic review given the potential risk of drawing premature conclusions. The review focused on primary consumption outcomes (ie, quantity and/or frequency of substance consumption) rather than related harm or secondary psychosocial outcomes. The search followed the PRISMA (preferred reporting items for systematic review and meta-analyses) guidelines [24]. Data extraction was guided by the CONSORT-EHEALTH (Consolidated Standards of Reporting Trials-Electronic Health Checklist) [25]. Risk of bias was assessed using the Risk of Bias tool developed by the Cochrane Collaboration [26].

Literature Search and Screening
The literature search utilized the following large databases: MEDLINE (Medical Literature Analysis and Retrieval System Online), PsycInfo, EMBASE (Excerpta Medica dataBASE, via the OVID platform), and ERIC (Education Resources Information Center, via EBSCO). The databases were searched using variations of 3 key terms: Substance AND Intervention AND Smartphone App (refer to Multimedia Appendix 1 for the detailed search strategy).

Eligibility Criteria
The search was limited to papers published in peer-reviewed journals from January 1, 2007 (the year when the first smartphone was released) to February 1, 2019. Papers were eligible if they adhered to the following criteria: (1) they reported primary empirical data; (2) the primary focus of the intervention was reduction in the use of illicit drugs, and/or alcohol, and/or tobacco; (3) substance consumption outcome data were reported; (4) the intervention was delivered via a mobile device app (not web-based or SMS in isolation); and (5) a controlled trial design was employed using either a randomized or matched control methodology. There were no limitations on the type of control conditions employed. In addition, no language restrictions were imposed.
The software program Covidence (Veritas Health International) [27] was utilized to ensure independence of screening and accurate calculation of agreements. As shown in Figure 1, the combined search identified a total of 2714 potentially relevant articles (4 papers sourced from examining the reference lists of existing reviews) that reduced to 2100 after duplicates were removed. In keeping with the methodology proposed by Moskowitz [28] and Foxcroft et al [29], an abstract and title search was conducted independently on all papers by 2 people (KG and RO) with a random 20% of these cross-checked (by PS). Any disagreements were discussed by the authors (PS, RO, and PL) and an agreement made, resulting in a total of 31 potentially relevant articles. These 31 papers were read independently and in full by RO and PS. During this process, 14 additional articles were excluded. Agreement regarding exclusions was high (84%), with only 2 disagreements, which were then discussed with PL to reach a final decision. Each of the reference lists of the remaining papers were scanned, and papers known to the authors were included, which identified 3 additional studies, bringing the total number of studies informing this review to 20.

Data Extraction
The following data were extracted (RB and RO) from the 20 eligible articles: year, authors, study sample, target substance, consumption measure, length of intervention, description of the intervention and control and associated sample sizes, assessment times, summary of statistical evidence, effect sizes where possible (Tables 1 and 2). As indicated earlier, the only outcome data extracted were consumption variables, given that it is the most consistently reported variable, thus enabling comparability between studies. In this respect, means and SDs for all consumption outcomes for each group at baseline, post intervention, and follow-up (if reported) were extracted as well as the relevant statistical data and effect sizes (if reported). Some information was gained via contacting authors, and this is noted in Table 2. Retention and usability data were also extracted to provide an informed discussion regarding the feasibility of delivering interventions for substance use via an app (Table 3). Considerable data are reported in these tables, and hence, only brief summaries are reported in the text. Same as app intervention 1 condition, however, with no normative or reflective feedback on alcohol questions (normative feedback and reflective evaluations were given for nonalcohol-related questions in this condition). Comparison includes app? Yes App intervention 1: "CampusGANDR" (PNF + ) uses normative feedback and peer judgement. The game is played weekly with peers, whereby participants answer one alcohol-related and one nonalcohol-related question about their behavior. After 4 days, they receive normative feedback (ie, how their responses compared with their peers) and reflective evaluations from other students (ie, how they were judged by their peers). App use: questions and feedback delivered once each per week over 6 weeks. Adjunct components: None. App intervention 2: PNF-Same as app intervention above, with only normative feedback to alcohol questions (reflective evaluations were for nonalcohol-related questions). No. However, as in the intervention condition, participants had access to another eBAC app for 6 weeks before, with access during intervention.
"TeleCoach" (a) reporting of alcohol consumption for a week; (b) brief feedback and psycho-education; (c) a relapse prevention skills training, and guided relaxation and mindful "urge-surfing." App use: no notifications, instructed to use at will. Adjunct components: eBAC app providing real-time feedback for 6 weeks before the intervention, with access during intervention.
University students reporting excessive alcohol consumption (>9 drinks per week for women; >14 for men); alcohol Gajecki et al, 2017 [39]; "TeleCoach" TAU k (support offered through the residential service); comparison includes app: No A-CHESS consists of (a) access to counselors, (b) a panic button related to relapse, (c) meditation, (d) recovery stories, (e) meeting locations, (f) recovery information, and podcasts App use: no notifications, instructed to use at will. Adjunct components: residential treatment.
Individuals who met the DSM-IV j alcohol dependence criteria; alcohol

Gustafson et al, 2014 [17]; "A-CHESS"
Intervention: mean 45.5 (SD 11.5) years; 28 Same app as intervention, but using noncontingent compensation, based on submitting videos of CO monitoring process, regardless of positive or negative CO reading. Comparison includes app? Yes "mCM" (a) using a CO l device to check abstinence; (b) using camera to record CO reading; (c) financial reward for each uploaded video showing "abstinent" CO, with progressive reinforcement schedule. App use: twice daily notifications for 4 weeks. Adjunct components: (a) two smoking cessation counseling sessions; (b) nicotine replacement therapy, lownicotine cigarettes, and Bupropion; (c) 6 calls to assist withmotivation; (d) an additional 2 weeks of mCM app use, but without financial compensation. Same as intervention, except sham-meditation recordings (eg, nonjudgmental awareness replaced with self-evaluation). Comparison includes app? Yes.
"Brief-MP" consists of five audio-guided mindfulness sessions on (a) "urge-surfing" the craving, (b) mindfulness of the breath, body, thoughts, and emotions. Five daily assessments probed craving, mindfulness, and affect. App use: asked to meditate once per day. Four random daily assessment notifications and one following meditation session. Adjunct components: none.  [48]; "BASICS-Mobile"   Number of days with drinking and smoking over 7 days a All effect sizes are Cohen d. Sign of effect size indicates agreement with hypothesized direction (positive implies app condition improved outcome to a greater degree than comparison conditions; ie, a reduction in consumption or an increase in rates of abstinence). b Where effect sizes not reported as Cohen's d, effect sizes were converted from reported effect sizes where possible or derived using pooled baseline SDs from intervention and control groups, as described by Morris [49]. c Good quality=all criteria in the Cochrane Risk of Bias tool were met, fair quality=one criterion not met or two criteria unclear and the assessment that this was unlikely to have biased the outcome and there was no important limitation that could invalidate the results, poor quality=one criterion not met or two criteria unclear and the assessment that this likely biased the outcome and there were important limitations that could invalidate the results OR two or more criteria listed as high or unclear risk of bias. d Studies in italics reported significant outcomes for intervention app at post-intervention and/or follow-up timepoints compared with control. Sample sizes reflect the number of participants included in the final analyses. e IRR: incidence rate ratio. f Amount in grams of pure alcohol in one standard drink varies across countries and is indicated in brackets. g OR: odds ratio. h Some data provided directly from authors. i ITT: intention to treat analysis (referred to in publication as "Missing Equals Smoking"). j PP: per protocol analysis (referred to in publication as "Follow-up Only").  Engagement-proportion of days used out of total possible days, and satisfaction-7 items rated 1 (low) to 5 (high). Four satisfaction items (used frequently, easy to use, well laid out, and confidence in using) measured on a 5-point Likert scale ("strongly agree" Baskerville et al, 2018 [31]; Crush the Crave to "strongly disagree"), overall satisfaction item using same scale as above, an overall helpfulness item on a 10-point Likert scale. HealthCall and graphs increased perceived benefit of app (91.89%). Of the 30 daily suggestions safety and privacy, effects on recall for cutting down drinking, 13 were rated as "helpful"/"very helpful" by over half the patients.
and knowledge of own drinking patterns, motivation and self-confidence to reduce consumption, and app's ability to prompt drinking goals. Mobile Application Rating Scale (5point rating scale, 23 items), assessing engagement, functionality, aesthetics, and information quality.

Hides et al, 2018 [43]; Ray's Night Out
Participants reported that they were unlikely to pay for the app:1.25, (0.69) and gave it a 3 out of 5-star rating: 3.13 (0.76).
Intervention group ("strongly agree" or "agree"): "The survey questions were easy to understand" (55.3%); "I was comfortable answering these questions" (68.1%); "I was Seven usability questions (5-point scale), assessing ease of use, recall Liang et al, 2018 [46]; S-Health able to remember the number of days or frequency using alcohol or drugs in the past feasibility, willingness to provide responses, etc.
week" (53.2%); "The smartphone screen was easy to use" (72.3%); "I prefer to answer these questions myself on a cellphone instead of having a person ask me" (46.8%).
93.5% accessed the system during the first week after leaving treatment. The A-CHESS services used by the greatest percentage of participants included discussions, my

Data Synthesis and Analyses
As stated, a decision was made a priori to not conduct a meta-analysis. Nonetheless, we report Cohen d effect sizes to enable meaningful comparisons and interpretations (Table 2). When effect sizes were not reported in the paper, we computed them using pooled baseline SDs from the intervention and control groups, as described by Morris [49].

Design and Target Sample
A total of 20 studies met the inclusion criteria for the review. Eighteen studies were randomized controlled trials (RCTs) and two were matched controlled studies [39,41], as the control groups in each of these studies were from a related trial (conducted by the same research group), matched on main measures, and adopting the same eligibility criteria (ie, matched controlled studies). Specifically, in the study by Gajecki [39], participants in the comparison were from an assessment-only control group in a concurrent study that had the same eligibility criteria and were matched on alcohol consumption. In the study by Hasin [41], the matched controls were the control participants from a previous randomized trial adopting the same eligibility criteria.

App Content, Complexity, and Supportive Components
Apps differed substantially in their intervention content. Most apps were stand alone, but 8 [17,30,31,39,41,42,45,46] of them had additional adjunct components such as supportive counseling; motivational interviewing; educational messages; links to resources; peer group supports such as Facebook groups; nicotine replacement therapy; audio-guided relaxation; and even a high-risk patient locator, which sends an alert to patients if they are approaching a high-risk drinking location (Table 1). In terms of app complexity, at least half of the apps consisted of multiple intervention elements [17,34,35,48]; others were simple and employed a single distinct intervention such as approach-bias training [44,47]. Apps varied substantially in the underlying theoretical approaches of their interventions (motivational interviewing, approach-bias modification, meditation, acceptance and commitment therapy, and relapse prevention). Finally, the majority of apps included self-monitoring of substance use as part of the intervention, and only 1 app (DrinkLess) directly tested the intervention components.

Comparison Conditions
The comparison conditions were highly diverse. That is, Crane et al [34], Crane et al [35], Earle et al [37], Hertzberg et al [42], Kerst and Waters [44], and Ruscio et al [47] used a variant or minimal version of the intervention app, controlling for some key intervention component; Bricker et al [33] used an unrelated app; Boendermaker et al [32] and Hasin et al [41] used the same intervention but not delivered by an app; Gonzalez and Dulin [40], Liang et al [46], and Davies et al [36] used a nonapp different substance use intervention; Aharonovich et al [30] and Gustafson et al [17] used the adjunct or treatment as usual intervention as the comparison group (ie, minus the app); Baskerville et al [31] used information material and links to resources; and 5 studies used a waitlist, passive control or assessment only [36,38,39,43,48].

Intervention Duration, Time to Follow-Up, Notifications, and Frequency of Contact
The intervention duration was generally short to medium length (1 to 8 weeks), with the exception of Gajecki et al (12 weeks [17]. Smoke Free and Drink Less did not specify an intervention length. Aside from 2 studies [17,42], no follow-up assessments were conducted following end-of-treatment measures. Most apps employed assessment and/or intervention notifications or alerts; however, 6 apps did not use notifications and instead requested that participants use the app at will or during specific events such as when drinking alcohol [17,32,33,38,39,43]. For the apps that did employ notifications, the most common schedule was once per day. Four apps used more frequent notifications 2 times per day [42], 3 times per day [48], and 4 times per day [44,47]. One app employed a single weekly email reminder [33].

Effectiveness Outcomes
Cohen d effect sizes were extracted or calculated for each substance consumption outcome. For five studies where outcome data did not conform to the requirements for the calculation as described by Morris [49], effect sizes were computed by converting from the reported effect size to Cohen d, as indicated in Table 2. In three cases, insufficient data were provided (despite requests to authors) to calculate a pre-post effect size.
A minority of studies (6/20; Table 2) reported significant reductions in substance use compared with the comparison group at post treatment or follow-up.
Of the 6 apps that reported superior outcomes (compared to controls) at post treatment, 3 targeted alcohol, one of which was with clinical participants (A-CHESS), and the other two were focused on university students (TeleCoach and CampusGANDR). One app was delivered to smokers (SmokeFree) and another targeted both alcohol and smoking in a university population (BASICS-Mobile), but in the latter study, only the smoking reductions (not alcohol) were superior to the comparison condition. Finally, one app (HealthCall) targeted illicit drugs and alcohol in an HIV population, but only a significant reduction in drug use (not alcohol) was reported in comparison to controls. It should be noted that most apps did report reductions in substance use; however, they were not necessarily superior to the control conditions post treatment. In addition, three apps were categorized as showing promise. Brief-MP (smoking app) and LBMI-A (alcohol app with a clinical group) reported intervention effects during treatment but not post treatment, and DrinkLess reported a significant reduction in alcohol consumption with a combination of two components (normative feedback and cognitive training).
Witkiewitz et al [48] reported on "BASICS-Mobile" that delivered monitoring, normative feedback, health information, alternative activities, and "urge-surfing" over 14 days focusing on alcohol and smoking reduction in university students. The app performed better for cigarettes smoked per day compared with a minimal control condition at post intervention (d=0.55), and no intervention effect was found for alcohol. Gustafson et al [17] showed that "A-CHESS," which delivered psychoeducation, recovery stories, meeting locations, guided meditation, and access to phone counselors over 8-months, performed better than treatment as usual for 30-day alcohol abstinence (d=0.37) and number of risky drinking days (d=0.24). Aharonovich et al [30] showed that, alongside motivational interviewing, "HealthCall"-an app delivered over 8 weeks employing motivational self-monitoring, personalized feedback, and the option to call a phone counselor-produced significantly lower rates of primary drug use compared with a motivational interviewing only control group (d=0.17). Gajecki et al [39] reported that "TeleCoach"-an app that delivered alcohol monitoring, personalized feedback, alcohol guidelines and risk situations, drink refusal skills, and "urge-surfing" over 3 months-was associated with a reduction in drinking occasions (but not quantity) compared with a waitlist group (d=0.30). Earle et al [37] reported that the app "CampusGANDR"-a campus-based game that primarily centered on normative and injunctive feedback over 6 weeks-was associated with a reduction in drink number over a weekend compared with a control app that provided feedback about activity reports unrelated to drinking (d=0.23). Crane et al [35] reported that "Smoke Free"-an app consisting of goals, monitoring, daily messages that reported on accrued benefits (eg, financial savings and estimated health improvements), and behavior change strategies over a 30 day period-was associated with higher 3-month continuous smoking abstinence rates compared with a minimal version of the app (d=0.22 using per protocol analysis and d=0.34 using intention to treat analysis; see Multimedia Appendix 1 for details).
In addition, we categorized three further apps as showing promise. Brief-MP and LBMI-A were associated with significant reductions in the intervention arm, although this difference was no longer significant at post intervention (see Multimedia Appendix 1 for details) [40,47]. In addition, although Drink Less was associated with no overall difference between the intervention and control app, a significant interaction was found between the normative feedback and cognitive training components within the intervention group only, suggesting that these two components in combination resulted in a greater decrease in alcohol consumption compared with their minimal app (see Multimedia Appendix 1 for details). Given that this analysis was exploratory (although prespecified), we await further research before drawing any conclusions.
Studies have used various methods to ascertain usability and user satisfaction, including reliable instruments such as the mobile application rating scale (MARS) [53], or a single item (Table 3). Satisfaction ratings ranged from moderate (50-80%) [33] to high (80% and over) [30,41]. For example, Hides et al [43] used the MARS and found that Ray's Night Out had good objective app quality and high (80% and over) levels of functionality, aesthetics, and information. Hasin et al [41] reported high satisfaction, with 86% of patients stating that HealthCall-S reminded them of their drinking goal and over 80% stating that it increased confidence and motivation to reduce drinking. In the study by Bricker et al [33], 59% said they were satisfied overall.
In summary, of the 6 apps that were significantly more effective than their comparison conditions, all reported small to moderate effect sizes. Moreover, 3 of the 6 app studies were assessed as having a high risk of bias and 3 as having a low risk of bias; hence, no particular pattern emerged regarding outcomes and bias. When multiple substance consumption measures were reported, significant outcomes were mostly variable. Further details of each study are provided in Multimedia Appendix 1 and the tables.

Discussion
The primary aim of this paper was to synthesize and report on an up-to-date systematic literature review focused on the effectiveness of substance use (alcohol, illicit drugs, or tobacco) interventions delivered via mobile apps. A total of 20 studies were included in the review, of which only 6 reported significantly greater reductions in substance use post intervention compared with comparison groups [17,30,35,37,39,48]. The average effect sizes were modest, although this is consistent with mobile apps in other fields, including mental health [15] and diet and exercise [54]. Two further trials [40,47] reported significant intervention effects during the treatment phase, with no significant group differences at post intervention. A third app reported a significant interaction for two intervention components within the app [34]. The 6 apps that performed significantly better than their comparison conditions varied substantially in intervention length, content, and complexity, and few commonalities across the majority of these emerged. In terms of app content, 3 of the 6 apps included normative feedback, and 1 app included personalized feedback (actual consumption compared with goals). Specifically, CampusGANDR rested heavily on personalized normative feedback and injunctive feedback (what peers think you should do); TeleCoach provided personalized normative feedback immediately following consumption reports; BASICS-mobile delivered normative feedback every day; and HealthCall included personalized feedback comparing actual consumption with personal goals. Interestingly, in the earlier study by Gajecki [38], the comparison condition, which included personalized normative feedback, performed better than the intervention, which did not deliver normative consumption feedback. This association between personalized feedback and normative feedback is consistent with previous face-to-face interventions demonstrating the effectiveness of these approaches [55] within mHealth approaches to substance reduction. Given the known importance of peers and normative attitudes in relation to substance use, including this component in future apps (particularly in young populations) may enhance efficacy.
In addition, the length of intervention may have played a role in influencing positive outcomes, with only 1 of the 6 highlighted interventions being under 6 weeks long (4, 6, 8, 12, and 35 weeks). In contrast, most of the studies that did not report intervention superiority ran for 4 or fewer weeks. Although only suggestive, it is possible that behavior change via an app may be more effective when the intervention component is greater than 4 weeks and participants engage for longer periods.
Retention rates were generally high across all studies, with the majority showing above 90% retention at postintervention or follow-up, except for the Smoke Free [35] and Drink Less [34] apps where retention at follow-up was less than 30%. The two latter studies differed from the rest as people enrolled in the study after having downloaded the app, as opposed to being recruited to a trial from the outset. This suggests that retention may be poor for apps used outside of research trials, and methods that enhance retention are of utmost importance if apps are to be effective as a public health approach. Moreover, 9 of the 20 studies reported usability data, with some variable results. Encouragingly, participants generally experience mHealth apps as easy and convenient to use. Considering that the poor usability of smartphone apps is common and can substantially compromise user engagement [56], these results are promising.
Four of the apps (LBMI-A, A-CHESS, Health Call, and Health S) were targeted at clinical samples who were primarily alcohol-dependent individuals, except for Health S (heroin addiction). Only 1 of the 4 reported superior outcomes (A-CHESS) [17]. This app intervention was for alcohol-dependent individuals who had already been in residential treatment, and hence, it functioned as a relapse prevention program. Furthermore, A-CHESS included adjunct components such as contact with counselors when required, indicating that the app alone was likely not responsible for the intervention effect. In addition, it is likely that this study has some risk of bias as it would have been clear to participants that they were in the intervention group given that the comparison group was treatment as usual with no additional support. It is not surprising given the complexity of alcohol and drug dependence that a mobile app may not result in significant positive outcomes for clinical samples, particularly given that most of the interventions except for A-CHESS were 6 weeks or less in duration. Although it is too early to draw any firm conclusions, it does appear that if mobile apps are helpful for those who are dependent on substances, it is likely to be most effective as posttreatment support rather than as the primary intervention. Interestingly, of the 7 apps that targeted tobacco use (1 targeted both alcohol and tobacco, BASICS), the only 2 reporting superior outcomes were Smoke Free and BASICS, which had no lower limit on smoking level, whereas the other trials included daily smokers, some of which were smoking 10 cigarettes a day, which would be considered in the mild to moderate dependence range. Once again, this tends to suggest that individuals with heavier substance use (if not clinical) are less likely to benefit from mobile apps. Given the small numbers, it is not possible to draw any conclusions regarding effectiveness in relation to the type of substance, although there is some suggestion that mobile apps are less effective with dependent individuals. We await further studies to confirm this conclusion.
Finally, it is important to note that the reductions in substance use produced by some of the app interventions were small in absolute terms. For example, compared with the comparison conditions, the app conditions with significant consumption outcomes produced mean reductions of one less day of drug use over 30 days [30], 0.8 of a day less drinking per week [39], 5% increase in the likelihood of being abstinent [17], and one less drink over a weekend [37]. Nonetheless, at a public health level, even small reductions at a population level can have a significant impact on the reduction of mortality and morbidity associated with problematic alcohol and other drug use and thus remain encouraging. Furthermore, although the majority of studies did not report "superior" outcomes (to their comparison conditions), in many cases, they reported significant decreases in alcohol or illicit drugs or tobacco. This will occur in study designs when comparison conditions are other apps or interventions delivered via other digital modalities (web-based and IVR) that we also know have a positive impact on substance reduction. In this respect, the mHealth field would benefit from greater consensus and clarity regarding the expectations of app efficacy and the role mobile apps should play in clinical treatment and public health approaches.

Limitations
Numerous limitations were apparent in the included studies. For example, most studies were affected by design limitations and risk of bias and many were small sample pilot studies. That is, studies varied considerably in terms of sample size, with many small studies (eg, eight intervention conditions had samples of 30 or fewer); in contrast, two studies had samples greater than 1000.
One of the most significant limitations was that comparison conditions varied considerably, and in many cases were poorly balanced with the intervention condition. This variability in design reflects distinct kinds of research questions and precludes being able to draw any conclusions (tentative or otherwise) about the effectiveness of apps compared with other modes of delivery for problematic substances. Similarly, the 6 apps reporting superior outcomes compared with controls are confounded by substantially different kinds of comparison conditions-some likely to have very little therapeutic benefit (eg, waitlist control) and others comprised a similarly comprehensive "treatment" as the intervention condition (eg, web-based version of the same intervention). In addition, some studies included comparison conditions that were poorly balanced in terms of content and frequency of contact [17,30,46]. For example, in one study [17], participants in the treatment group had more counselor contact and completed a weekly assessment of alcohol intake, not delivered in the comparison condition, which may have produced an assessment effect. The Bricker et al study [33] included unspecified adjunct therapies (intervention group participants were encouraged to use other therapies alongside the trial, with no reporting of the details).
Finally, the risk of bias was generally high. At times, this was due to lack of detailed information, so it is possible that the true risk of bias could be lower across the studies. Overall, only 6 studies were classified as having either no or low risk of bias (Table 2; Multimedia Appendix 1). A further seven studies were assessed as potentially being biased, but a lack of information did not enable them to be classified as low risk. The remaining 8 studies were assessed as having a high risk of bias. Moreover, 3 of the 6 studies reporting superior outcomes were assessed as having no risk or low risk of bias, providing confidence that half of the significant findings were highly robust. Two of the superior trials had some risk of bias, of which one was due to unclear descriptions and one had high risk.

Future Recommendations
This review highlights a number of key areas for future work in this fast-growing area. First, it is clear that we need sufficiently powered trials with longer follow-up periods and greater attention to reduce the potential risk of bias in these studies. An increasing focus on protocol papers, pre-registered trials, and adherence to Cochrane guidelines (and reporting thereof) will result in the ability to draw stronger conclusions in the next review.
Second, this review highlighted considerable variability in app content and complexity across a range of substances and inadequate descriptions of app content within publications. Furthermore, only half of the studies included descriptions of the user experience, which is critical to consider alongside the effectiveness data. If engagement and satisfaction from the user perspective is low, then the effectiveness outside of trial studies will be very low. The lack of usability data can be partly explained by word constraints and the reluctance of some journals to publish "user experience" papers. Thus, we recommend the field to engage with the Open Sciences Framework and similar platforms when providing details of app content, theories of change, and design.
In some cases, the development of app interventions was clearly described within the context of a theory of change for substance use reduction (eg, within the papers reporting on Smoke Free, Drink Less, and BASICS-mobile). However, in many cases, it was unclear what the proposed mechanism of change was and why it was chosen, and at times, it was difficult to ascertain the content of the intervention. In some cases, the rationale was to transfer "effective" face-to-face treatments to mobile apps (ie, BASICS-mobile), whereas other authors developed bespoke app interventions dependent on user input and the unique aspects of smartphone technology (ie, Ray's Night Out). Ultimately, despite the substance use field having now produced 20 controlled evaluations of mobile apps, we remain unclear as to which "types" of interventions are likely to be most effective and the theory of change model underpinning them. Finally, in most cases, except for the DrinkLess app, there was little investigation of the effectiveness of the intervention components. The positive interaction between cognitive bias training and normative feedback found in the post-hoc analyses of the DrinkLess app is promising, given the ease by which both of these intervention components can be translated into a mobile app. Furthermore, the cognitive training component is a habit-forming activity and is well suited to an intervention that can be easily attended to on a smartphone at any time. Importantly, some behavior change interventions may be more aligned to digital delivery than others. For instance, we have recently proposed that a time-based goal setting technique rather than traditional count-based goals (to reduce smoking or alcohol or drug use) could be substantially enhanced by the unique capabilities of app functionality [57]. This might include daily reminders, timed alerts, automating reduction goals, supportive psychological strategies, and personalized delivery of interventions. As we continue to make further technological advancements in app delivery, well-aligned intervention content will be critical to the success of mHealth. A fine-grained analysis of the content of mobile apps in this field would be a helpful exercise in future publications.
Third, it was surprising to see a lack of iterative co-design processes being described in many of the publications (although Smoke Free, Drink Less, and Ray's Night Out were some of the exceptions). Although potentially omitted in some cases due to manuscript length constraints, usability testing before evaluating the app in larger RCTs is critical. Such usability testing allows researchers to then modify the functionality based on user and clinician feedback, thereby avoiding inefficient or highly limited RCTs. Greater emphasis on co-design and usability testing will enhance our ability to improve retention. For example, we know that therapist guidance reduces attrition in digital interventions, but this can be costly. One possibility would be to trial automated guides or coaches to provide support and reduce attrition. Furthermore, the use of personalized reminders and strategies, machine-learning functionality, passive-sensor reporting, and context-based reminders have the potential to increase retention, in addition to other uses. However, none of the apps in this review incorporated these more complex and sophisticated technologies. We found this surprising, a pattern that could in part reflect funding constraints. Although evidence is lacking as to what level of collaboration is appropriate and at which point during the design process, it is likely that greater interdisciplinary and co-design collaboration, including users, researchers, clinicians, software developers, policy makers, marketing teams, and graphic designers, can produce more sophisticated products that will leverage these capabilities in the context of university research trials.
Finally, the considerable range of comparison conditions was a major limitation of this review, with the rationale for some of the chosen comparison conditions being somewhat perplexing. Each different type of comparison condition reflects a different research question and implies quite different purposes for an app focused on substance reduction. Researchers could consider whether their trial seeks to determine if the tested app will produce superior effects to an identical, similar, or different app intervention; computer intervention; face-to-face intervention; treatment as usual; or no intervention. At a public health level, we propose that if an app reduces substance use to the same degree as a more costly intervention, then this should be considered a positive trial outcome. This was not a discussion engaged in most of these papers and does point to a broader policy-based discussion as to what constitutes "app effectiveness" to ensure transparent communication with the public. Such considerations are important, given the potential broad reach that apps have in remote and financially disadvantaged communities or in addressing numerous other barriers to help seeking. The current saturation of smartphones in our society makes them a powerful mHealth tool, but further work is needed to understand how best to harness their capabilities, engage the user, and generate positive intervention outcomes. With hopeful anticipation, we look forward to what the next 5 years in mHealth research and development brings.

Conclusions
It has been approximately 10 years since substance use interventions delivered via smartphone apps have become available, and the majority of controlled evaluations have been published in the last 5 years. As we are likely to see an acceleration in the development of smartphone app substance use interventions over the coming years, it is timely to take stock of the field and identify strengths, limitations, and future directions. This state-of-the-art review highlights the diversity in app design, with a range of options being explored for both community and clinical populations. The review also highlights substantial variability in study design, intervention types, comparison conditions, measures, follow-up period, length of intervention, and reporting details, making it almost impossible to infer factors or themes associated with the effectiveness of substance use apps specifically. We see this review as a taking stock moment; we are clearly not at the point where any firm conclusions can be drawn. Importantly, guidance from the details and outcomes of this review will hopefully strengthen the mHealth field in its future endeavors to assist individuals in the community to reduce their problematic consumption of alcohol, tobacco, and/or illicit drugs. Ultimately, we hope that mHealth can provide affordable, accessible, and effective behavior change interventions in this field.
Co-design is critically important in all intervention development; however, many studies do not incorporate the user until after important design and intervention decisions are made. To answer whether an app intervention is equivalent or superior in efficacy to other formats, the app should be tested against the same intervention delivered within a nonapp comparison condition. Ultimately, comparison conditions should be selected based on the fundamental research question. In addition to app-specific functionality that can be leveraged to produce innovative interventions, apps that demonstrate at least outcome equivalence compared with face-to-face treatment or treatment as usual would offer numerous advantages, including low cost, accessibility, reduced barriers to help seeking, and potentially higher engagement. Relatedly, efficacious app interventions that are able to recruit individuals otherwise unwilling to seek help would also offer substantial advantages in addressing the treatment gap. Indeed, app interventions have generated considerable interest in public health research, with some promising signs emerging from mental health apps [15], although see study by Weisel et al [16]. However, a similar story cannot yet be told for apps focused on helping people reduce problematic alcohol, tobacco, and/or illicit drug use. Although the field is still in its infancy, this review cautiously suggests that app interventions for problematic substance use are yet to clearly demonstrate their utility. In particular, and not surprisingly, this seems to be the case for clinical or heavier users of substances. A more positive state of the literature in the next review is likely to be enabled by greater collaboration between multidisciplinary teams, iterative learning from each other's products, selecting evidence-based and mobile app-aligned content, greater expert and consumer input, attention to reducing risk of bias, comprehensive usability testing, more personalized interventions, and methods that leverage greater user engagement and retention.