Just-in-time Adaptive Mechanisms of Popular Mobile Applications for Individuals with Depression: Systematic Review

Background: There is an increasing number of smartphone applications (apps) focusing on prevention, treatment, and diagnosis of depression. A promising approach to increase the effectiveness while reducing the individual’s burden is the use of just-in-time adaptive intervention (JITAI) mechanisms. Objective: With this work, we systematically assess the use of JITAI mechanisms in apps for individuals with depression. Methods: We systematically searched for apps addressing depression in the Apple App Store, the Google Play Store, and in curated lists from the Anxiety and Depression Association of America, the United Kingdom National Health Service, and the American Psychological Association in August 2020. Relevant apps were ranked according to the number of reviews (Apple App Store) or downloads (Google Play Store). For each app, two authors separately reviewed all publications concerning the app found within scientific databases (PubMed, Cochrane Register of Controlled Trials, PsycINFO, and Google Scholar), publications cited on the app’s


Table of Contents
Please make my preprint PDF available to anyone at any time (recommended). Please make my preprint PDF available only to logged-in users; I understand that my title and abstract will remain visible to all users. Only make the preprint title and abstract visible. No, I do not wish to publish my submitted manuscript as a preprint. 2) If accepted for publication in a JMIR journal, would you like the PDF to be visible to the public?
Yes, please make my accepted manuscript PDF available to anyone at any time (Recommended).
Yes, but please make my accepted manuscript PDF available only to logged-in users; I understand that the title and abstract will remain v Yes, but only make the title and abstract visible (see Important note, above). I understand that if I later pay to participate in <a href="http

Introduction
Worldwide each year at least 246 million people suffer from depression [1] and depression is the leading cause for years lived with disability [2]. While effective treatments for depression exist [3][4][5] most individuals in need still do not receive it [6], or those obtaining treatment do not benefit. People seeking help often face barriers such as high costs for treatment, a shortage of trained clinicians, the stigma associated with seeking help, and accessibility difficulties [7][8][9][10].
Mobile applications (apps) may have the potential to address the rising prevalence and insufficient resources available for the treatment of depression [11,12]. Apps are already an integral part of most people's everyday lives [13], and the threshold for engagement with apps is assumed low resulting in prompt, flexible, portable, and anonymous treatment [14]. Individuals otherwise not reachable could receive treatment [15] and interventions could be delivered in economies with limited resources for mental health [16]. Small to large effect sizes are reported by several systematic reviews showing apps and other digital interventions to reduce symptoms of mental health problems including depression [17][18][19][20]. Finally, apps can be used in real-life situations, where behavior change is most desirable and clinicians cannot intervene [14].
The aim of this work is to complement the existing assessment of apps addressing depression by focusing on the use of just-in-time adaptive intervention (JITAI) mechanisms [21,22]. JITAIs aim to deliver an adaptive treatment (i.e. personalized/tailored) at a time of vulnerability (i.e. "person's transient tendency to experience adverse health outcomes or to engage in maladaptive behaviors" (p. 1210) [21]) and receptivity (i.e. "the person's transient tendency to receive, process, and use the support provided" (p. 1210) [21]). The tailoring of the treatment and timing is to be determined by measuring changes in relevant variables (e.g. changes in mood). While ecological momentary assessments may facilitate the detection of these states of vulnerability and receptivity, passive measurements (e.g. using the location derived from a smartphone's GPS data) are regarded as the gold standard of measurements for JITAIs. These passive measurements have the advantage of enabling an unobtrusive, continuous observation [23]. JITAIs tailoring the content to the person, situation, and time by using these passive measurements were therefore proposed to reduce the burden and increase the effectiveness of interventions [21,22].
Evidence for higher effectiveness of JITAIs compared to non-JITAI treatment and waitlist control groups was investigated in a recent meta-analysis [24], finding moderate to large effect sizes (Hedges' g = 1.65 when comparing to waitlist-control and Hedges' g = 0.89 when comparing to non-JITAI treatments) of primary outcomes produced by 33 empirical studies. Due to this potential of JITAIs to increase effectiveness while reducing the burden and the prominence of the JITAI framework in the scientific community, we aim to review to what degree popular apps addressing depression use JITAI mechanisms by reviewing what and how relevant symptoms of depression (e.g. mood) are measured. We are also interested in whether peer-reviewed evidence can be found that investigated increased effectiveness or efficacy of these apps attributable to the usage of JITAI mechanisms.
To this end, we systematically assessed popular apps targeting depression, i.e. apps that are most reviewed on the Apple App Store and most downloaded on the Google Play Store. We argue that the investigation of JITAI mechanisms is necessary due to their potential of increasing effectiveness while simultaneously decreasing the burden of users. The focus on popular apps is important because they are listed at the top of search results and thus, are very likely to be downloaded and used [25]. Moreover, a high number of downloads implicates that they have been found useful by users [26], and may indicate that people continue to use or recommend them. Recent evidence also indicates that the two most popular apps for depression and anxiety were responsible for 90% of active users [27].

Search Strategy and Selection Criteria
We conducted this systematic review following the same methods used in existing reviews of popular apps addressing mental health problems. We systematically identified and reviewed apps that were publicly available in the U.S. and U.K. app stores, because investments in digital health companies in these countries ranked first (US 2019: 7.4 billion USD) and second (UK 2019: 5 billion USD) of all English-speaking countries in 2019 [28]. The Apple App Store and Google Play Store were used as they have a combined market share of ~99.4% [29]. We searched the two stores by entering the term "depression" in the search fields of the respective stores and included all apps found in both stores of both countries. We also reviewed curated lists of health apps from prominent organizations, namely the Anxiety and Depression Association of America [30], the National Health Service [31], and the American Psychological Association [32]. By doing so, we wanted to ensure that we did not miss any app recommended by important institutions and experts for mental health. The apps found on the respective lists addressed several different mental health problems. We selected only the apps addressing depression for further assessment. Searches were carried out in August 2020.
For further assessment, we included apps that (1) targeted the treatment of depression or reduction of symptoms of depression by (2) delivering at least one active ingredient and were (3) available in English. We defined an active ingredient along with previous work from Michie [33], as a function supporting the users in their management of depression, that is designed to reliably and causally change processes that govern behavior [34]. An example of an active ingredient for depression could be a goal-setting task, a breathing exercise, or a recording of daily mood. Apps targeting other mental health illnesses such as anxiety or post-traumatic stress disorder were not excluded as long as depression was addressed as well. We included both free of charge and paid apps. Browser-based treatments were not included. We excluded apps that only targeted professionals (e.g. Depression Psychopharmacology), only offered a diagnostic service (e.g. PHQ-9 Depression Test Questionnaire), only provided quotes or inspirational text (e.g. Depression Quote Wallpapers), or only conveyed information without the goal of eliciting behavior change or engaging with individuals (e.g. Psychology Book -1000+ Amazing Psychology Facts).
Two authors (GWT, ADF) separately reviewed each app according to the inclusion and exclusion criteria. The interrater agreement was excellent indicated by a Cohen's Kappa of 0.91. In case of disagreement, a consensus was reached via discussion. After this initial assessment, we ranked all included apps from the Apple App Store separately by their number of reviews and all included apps from the Google Play Store by their download category (e.g. 1,000,000+ and 500,000+ downloads).
We then separately identified the most popular apps available only from the Apple App Store, or available only from the Google Play Store, or available from both stores. For apps only available on the Apple App Store we selected the five most reviewed apps, as users rarely scroll past the first five apps [25]. For the Google Play Store, we used the download category of the app ranked fifth on the list (e.g. 500,000+ downloads). All apps in the 500,000+ download category were then included. For apps available on both app stores, we used the Google Play 500'000+ download category to determine inclusion, regardless of the number of reviews on the Apple App Store. Regardless of their number of downloads or reviews, we included all apps from the curated lists meeting the inclusion and not violating the exclusion criteria.

Data Analysis
Our evaluation covered the following areas: general information about the app, potential mechanisms for delivery of JITAI, and peer-reviewed evidence. We developed our evaluation framework before reviewing the apps and used the Covidence Systematic Review software (Veritas Health Innovation Ltd., Australia, version accessed August 2020) to review the apps. All of our questions are listed in the Codebook in Supplementary Table 1-5, along with the sources from which we derived them. Two raters (GWT, ADF) separately evaluated each included app as follows: First, we gathered general information about the apps including the name of the provider, additional affiliated organizations (e.g. other companies, universities, governments, or NGOs), and time since last updated. Second, we reviewed the app's website and recorded all publications provided and information about JITAIs. Third, we searched for peer-reviewed publications on PubMed, Cochrane Register of Controlled Trials, PsycINFO, and Google Scholar using the search term [(Name AND App) OR (Name AND Application AND Smartphone)].
Fourth, we reviewed the full text of each study found on the website and the different databases. We excluded books, theses, systematic reviews evaluating several different apps, and clinical trial registrations. After this, each study was evaluated in line with prior work [35] including the year of publication, journal name, journal impact factor, the number of subjects, study purpose, and study design (i.e. Randomized control trial, open trial). We also extracted the information available about JITAI mechanism. We reviewed to what degree the apps could be considered JITAIs by reviewing whether and to what degree relevant features (e.g. vulnerability) derived from the JITAI concept by Nahum-Shani, Smith [22] were implemented. We assessed how the support was tailored by reviewing the symptoms of depressions that were measured (derived from the ICD-10 and DSM-V), and the self-report data or sensor and device analytics (derived from related work [23], from the Android Developers Guide [36], and the iOS security Guide [37]) that were used. We also reviewed whether tailoring to traits (i.e. "tailoringto-people" [21]) was used by checking for questions about demographics and social-economic status. Since JITAI mechanisms are proposed to increase the effectiveness or efficacy of apps [22] we reviewed whether the publications addressed effectiveness or efficacy and whether JITAI mechanisms were investigated in these publications.
Finally, we reviewed the app itself and extracted the information available about JITAI mechanisms. The results from each rater were compared and a consensus was reached by discussion if necessary. We reviewed each app in September 2020 and the process is illustrated in .

Results
We found 249 apps on the Apple App Store, 217 apps on the Google Play Store, 57 apps on both stores, and 135 apps on the curated list yielding a total of 658 apps. We removed 17 duplicates, 349 apps that did not mention depression, 123 apps with no active component, eight apps that were not accessible, one app not available in English, and one app targeting professionals. We ranked the apps found only on the Apple App Store based on their number of reviews and included the five most reviewed apps. We ranked the remaining apps found on the Google App Store and apps found in both stores according to their number of downloads category. The fifth most downloaded app on the Google App Store had a download category of 500,000+. Therefore, we included all apps found on the Google App Store and all apps available on the Apple App Store and Google Play Store with more than 500,000+ downloads yielding 17 apps. We included six apps from the curated lists that met the inclusion criteria of mentioning depression and did not violate the exclusion criteria, yielding a total of 28 apps. A flow chart of the results from the review process is illustrated in Figure 2.

Publications
We found 68 peer-reviewed publication for the 28 reviewed apps (see Appendix 1 and Appendix 2). We found at least one publication for 16 apps (n = 28, 57%), at least one peerreviewed publication investigating the effectiveness for nine apps (n = 28, 32%), and at least one peer-reviewed publication investigating the effectiveness for five apps (n = 28, 18%). While 23 (34%) of the 68 publications investigated the effectiveness and 14 (21%) publications investigated the efficacy of the apps not one publication evaluated an increase of effectiveness or efficacy by using JITAI mechanisms. Extracted information from all reviewed publications can be found in Appendix 2.

Apps
The 28 apps included were rated 2,808,465 times, with each app being rated on average 100

JITAI mechanisms
We found that 25 (n = 28, 89%) of the reviewed apps measured some kind of depression symptoms when interacting with the app (e.g. initial assessment when starting the app). Three apps (n=28, 11%) did not use any measurements, 20 apps (n = 28, 71%) used at least one selfreport (e.g. daily report of mood changes via a rating), while five apps (n = 28, 18%) used selfreports and sensors and devices analytics (e.g. taking a picture associated with a mood). Figure 3 illustrates how many depression symptoms were measured by different self-reports or sensors and device analytics for each of the reviewed apps. Mood Tools -Depression Aid measured the most depression symptoms (ten different symptoms measured) while not using any sensors and device analytics. Happify and Youper measured fewer depression symptoms (eight and four respectively) but used two different sensors and device analytics. Our findings regarding the usage of self-reports and sensors and device analytics are summarized in Figure 5. In total we found that a symptom was measured by a self-report or sensors and device analytics 196 times. To measure different depressive symptoms self-reports were used almost exclusively with 189 times (n = 196, 96%) and sensors and device analytics were rarely used with seven times (n = 196, 4%). The self-reports used most frequently to measure different depressive symptoms were closed questions consisting of ratings, Likertscales, and multiple-choice questions with 151 times (n = 196, 77%). Open questions were used 38 times (n = 196, 19%). The sensors and device analytics that were used most frequently were vital signs (mostly heart rate) and camera with each used two times (respectively n = 196, 1%). The symptom that was used most frequently was mood with 59 times (n = 196, 30%), followed by activity which was measured 31 times (n = 196, 16%). Unhelpful beliefs and sleep were measured 23 (n = 196, 12%) and 20 (n = 196, 10%) times. Figure 4: Heatmap of measurements used to measure symptoms. The heatmap illustrates the number of times symptoms of depression were measured by self-reports or sensors and devices analytics summarized over the 28 reviewed apps. A darker color illustrates a higher number of occurrences, also indicated by the annotation in the cells.
When possible, we tried to match the measurement of the depressive symptom to a mechanism relevant to the JITAI concept. We were able to do so for state of vulnerability, proximal outcomes, distal outcomes, and tailoring variables. Some of the measurements could have been used for two or three JITAI mechanisms. Therefore, double counting of the symptoms and measurements for each mechanism is possible. Figure 5 illustrates our findings, including which measurements were used to measure which symptom and for which JITAI feature. The figure shows that some sensors and device analytics were not used as a passive measurement but rather to actively capture changes. For example, the camera was used as a measurement for activity by asking the users to take pictures of locations that they had been to or to take a picture of something that made them sad to describe their mood. Figure 5: Connection between JITAI mechanisms, symptoms, and measurements. Sankey-Diagram illustrating for which of the different JITAI mechanisms (state of vulnerability, proximal outcomes, distal outcomes, tailoring variables) we were able to match a depressive symptom (e.g. mood), and the measurements used to capture the changes (e.g. closed question). The JITAI mechanisms are displayed in blue, depressive symptoms in orange, green, and purple, and measurements in grey. The size of the rectangle indicates the number of times the mechanism, symptom, or measurement was found. The thickness of the connection indicates the number of times a measurement or symptom was used. Some measurements have been assigned to two or three JITAI mechanisms and double counting is therefore possible.

Principal Results
We reviewed the 28 most popular or recommended apps for depression found on the Apple App Store, Google Play Store, and in curated lists of respected authorities for mental health. Regarding our main aim to investigate JITAIs mechanisms, we found that not one of the reviewed apps specifically mentioned the use of JITAI mechanisms in the app, on their websites, or in the identified peer-reviewed publications. We found that three apps (n = 28, 11%) did not use any measurements and 20 apps (n = 28, 71%) only used self-reports (96% of all 196 measurements were self-reports). While such self-reports can be used as "in the moment assessments" (i.e. Ecological Momentary Assessments) that are closely related to the JITAI concept [24] we argue that they are insufficient to leverage the full potential of JITAIs. We found that five apps (n = 28, 18%) also used sensors and device analytics (4% of all 196 measurements were sensor and device analytics). However, we found that most sensors such as the camera were used as "in the moment" assessments or as part of an app's features and not to tailor the content or timing. Some of the apps measured depressive symptoms by self-reports when the app was first opened to determine what content should be presented (e.g. measuring the need to focus on sleep and mood). This type of static tailoring has been observed to be less effective than dynamic tailoring [38] and is in our view not sufficient for an app to be considered as a JITAI.
Interestingly, we found that besides mood (30% of all 196 times a symptom was measured) or decreased activity (16% of all 196 times a symptom was measured) other symptoms of depression were measured less frequently. Given the broad variety and severity of depression [39] and the high comorbidity with other mental health problems such as anxiety [40], this focus on a subset of symptoms may not be sufficient to detect changes that might indicate a need for support. Additionally, a focus on the improvement of main symptoms (e.g. mood and activity derived from the DSM-V) may not be ample to contribute to the understanding of the complex processes involved in depression. Accurate and continuous measures of psychophysiological changes enabled by passive measurements of various physiological features (e.g. changes in breathing patterns or vital signs), may, however, improve the understanding of depression in general. Such an understanding could in turn enable an even more successful implementation of JITAIs. Our findings highlight that while the JITAI concept appears to be widely known in the scientific digital health community [24] and different studies outline the possibility of detecting changes in depression or depressive symptoms such as mood by using different passive measurements [41][42][43][44][45][46][47][48] these mechanisms, surprisingly, have not been implemented in the real world aside from baseline or progress assessments.
Related to these findings, we were interested to what degree the effectiveness and efficacy was investigated in peer-reviewed publications since JITAIs are proposed to increase the effectiveness or efficacy of apps. In none of the 68 reviewed publications JITAI mechanisms were investigated. Therefore, our findings highlight that the proposed increase of effectiveness or efficacy by using JITAI mechanisms is not evaluated in settings using real-world apps. Additionally, we found great variability of scientific evidence of the reviewed apps despite an increased interest in digital health, and several publications addressing this topic, especially within the last five years [26,49,50].

Limitations
The strengths of this study are the large number of apps initially screened, the analyses along a framework developed from existing work, the rigorous methodology of reviewing all identified studies addressing the apps, the apps' websites, and the apps themselves. Nonetheless, it has several limitations. We reviewed the apps at a single point, which is a shortcoming found in related work as well. We are aware that the app stores are dynamic with constant changes [25] but a long-term review of the apps would have not been feasible. We may address this in our future work. Besides the lists we reviewed from the Anxiety and Depression Association of America, the National Health Service, and the American Psychological Association other organizations offer a rating system or a list of reviewed apps. These include but are not limited to the American Psychiatric Associations, PsyberGuide, and iMedicalApps. We did not review these lists as we expected a high number of overlaps and the fact that not all of the apps found on the lists were reviewed (e.g. Dartmouth PATH was not reviewed on Psyberguide, last checked, 27th of January, 2021), and that not all of the apps mentioned on the lists were recommended (e.g. Mood Watch Review, with low credibility, user experience, and transparency ratings on Psyberguide, last checked, 27th of January, 2021). We, however, see the value in a central platform for reviews of mental help apps and would suggest incorporating findings regarding the use of JITAI mechanisms into the existing review criteria. Finally, the review of the apps initially included other aspects such as the usage of evidence-based treatment, conversational agents, and the revenue model. Reporting these findings would have exceeded the scope of this review.

Comparison with Prior Work
We found eleven reviews investigating different aspects of apps addressing depression. Six of these reviews assessed the content or features of the apps, with one of the six adjusting their analysis to the number of users. The remaining studies investigated usability, adherence to clinical guidelines, claims, or data sharing and privacy practice. One meta-analytic review investigated effect sizes of just-in-time adaptive interventions compared to control groups or other interventions but this review did not focus on apps or mental health. We found no study investigating the use of JITAI mechanisms or review of measurements used to capture changes of relevant features in apps. Furthermore, we did not find any studies reviewing whether realworld apps provide evidence for improving their effectiveness or efficacy by using JITAI mechanisms.

Conclusions
In conclusion, our findings indicate that due to the limited use of measurements for depressive symptoms, with the exception of self-reports as indicators for progress or initial tailoring, the 28 most popular or recommended apps addressing depression cannot be considered to be JITAIs. An increase in effectiveness or efficacy by using JITAI mechanisms was also not evaluated by any of the reviewed publications. Due to these findings, we argue that the reviewed apps do not yet leverage the full potential of digital health interventions by providing tailored support when it is most needed and in a most helpful way.
GWT designed the evaluation framework with inputs from TK and ADF. GWT and TK designed and implemented the search strategy. GWT and ADF screened and coded the apps, websites, and studies, and extracted the data. GWT analyzed the data and drafted the initial manuscript supervised by TK. BK, NCJ, ASS, LTC, and EF provided methodological guidance and feedback on the manuscript. All authors reviewed and approved the final manuscript. This work has been partially supported by the National Institute of Mental Health (R01 MH123482) and in part by CSS Insurance, Switzerland. The National Institute of Mental Health and CSS Insurance had no role in the study design, data collection, data analysis and interpretation, writing the manuscript, or reviewing and approving the manuscript for publication.
The reviewed apps and reviewed studies are publicly available. The extracted data used in this review will be shared full open access beginning three months and ending 24 months following study publication. For further questions or material request please contact gteepe@ethz.ch.

Conflicts of Interest
GWT, EF, and TK are affiliated with the Centre for Digital Health Interventions (www.c4dhi.org), a joint initiative of the Department of Management, Technology, and Economics at ETH Zurich and the Institute of Technology Management at the University of St. Gallen, which is funded in part by the Swiss health insurer CSS. EF and TK are also cofounders of Pathmate Technologies, a university spin-off company that creates and delivers digital clinical pathways. However, Pathmate Technologies is not involved in this study. NCJ and Dartmouth College are the owners of a depression and anxiety application entitled "Mood Triggers". Despite this, owning Mood Triggers is not a financial conflict of interest given that Mood Triggers is not intended to be revenue-generating, but rather used to deliver and evaluate no-cost scalable treatment treatments using just-in-time adaptive interventions.

Figures
Depressive symptoms measured and frequency of measurements used for each of the 28 reviewed apps.
Heatmap of measurements used to measure symptoms. The heatmap illustrates the number of times symptoms of depression were measured by self-reports or sensors and devices analytics summarized over the 28 reviewed apps. A darker color illustrates a higher number of occurrences, also indicated by the annotation in the cells.