This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Mental health apps (MHAs) provide opportunities for accessible, immediate, and innovative approaches to better understand and support the treatment of mental health disorders, especially those with a high burden, such as bipolar disorder (BD). Many MHAs have been developed, but few have had their effectiveness evaluated.
This systematic scoping review explores current process and outcome measures of MHAs for BD with the aim to provide a comprehensive overview of current research. This will identify the best practice for evaluating MHAs for BD and inform future studies.
A systematic literature search of the health science databases PsycINFO, MEDLINE, Embase, EBSCO, Scopus, and Web of Science was undertaken up to January 2021 (with no start date) to narratively assess how studies had evaluated MHAs for BD.
Of 4051 original search results, 12 articles were included. These 12 studies included 435 participants, and of these, 343 had BD type I or II. Moreover, 11 of the 12 studies provided the ages (mean 37 years) of the participants. One study did not report age data. The male to female ratio of the 343 participants was 137:206. The most widely employed validated outcome measure was the Young Mania Rating Scale, being used 8 times. The Hamilton Depression Rating Scale-17/Hamilton Depression Rating Scale was used thrice; the Altman Self-Rating Mania Scale, Quick Inventory of Depressive Symptomatology, and Functional Assessment Staging Test were used twice; and the Coping Inventory for Stressful Situations, EuroQoL 5-Dimension Health Questionnaire, Generalized Anxiety Disorder Scale-7, Inventory of Depressive Symptomatology, Mindfulness Attention Awareness Scale, Major Depression Index, Morisky-Green 8-item, Perceived Stress Scale, and World Health Organization Quality of Life-BREF were used once. Self-report measures were captured in 9 different studies, 6 of which used MONARCA. Mood and energy levels were the most commonly used self-report measures, being used 4 times each. Furthermore, 11 of the 12 studies discussed the various confounding factors and barriers to the use of MHAs for BD.
Reported low adherence rates, usability challenges, and privacy concerns act as barriers to the use of MHAs for BD. Moreover, as MHA evaluation is itself developing, guidance for clinicians in how to aid patient choices in mobile health needs to develop. These obstacles could be ameliorated by incorporating co-production and co-design using participatory patient approaches during the development and evaluation stages of MHAs for BD. Further, including qualitative aspects in trials that examine patient experience of both mental ill health and the MHA itself could result in a more patient-friendly fit-for-purpose MHA for BD.
There are many critical factors that can influence the course and outcome of mental health disorders. Two key factors are (1) early and accurate identification of the first onset and subsequent relapses of the disorder, leading to the institution of appropriate management, and (2) access to appropriate treatment locations. For bipolar disorder (BD), the average delay between the onset of symptoms and the first institution of treatment can be as long as 10 to 15 years [
In 2020, an estimated 6 billion smartphones were in use across the globe [
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart for scoping reviews.
Currently, MHAs can be seen to improve engagement and accessibility for individuals in rural areas where health care provision is increasingly difficult to access [
The socioeconomic cost of BD in the United Kingdom is well recognized [
As the development of an MHA for BD requires iterative processes with stakeholders outside of the academic and clinical research environment, process evaluation is important (in addition to more traditional outcome measure methods) to ensure the app remains user-friendly and functional without compromising clinical outcomes. This scoping review aims to address this research gap through mapping existing literature on process and outcome evaluation methods of MHAs for BD to increase the understanding around currently available evaluation tools and the latest practice.
The aim of this scoping review is to systematically explore current process and outcome measures to identify the best practice for evaluating MHAs for BD. The focus is on apps for BD designed for individuals across the lifespan. Conducting a scoping review will allow health care systems to be more structurally informed on how to accurately evaluate the effectiveness of such technologies when implementing them into routine care [
The review was informed by the Arksey and O’Malley 5-step framework [
A scoping search was initially performed in the following databases: PsycINFO, MEDLINE, Embase, EBSCO, Scopus, and Web of Science. Relevant search terms were identified from key papers, and the search strategy was developed iteratively in MEDLINE and then translated across the other databases, up to January 2021. Due to time constraints, grey literature sources were not investigated, and the search was limited to articles published in English, as no resources were available to undertake translation work. Broad search terms were used to reduce the likelihood of article omission. The complete search strategy for MEDLINE is available in
For studies to be included in this review, they needed to meet the following inclusion and exclusion criteria. Studies were included if they (1) were related to BD, (2) targeted individuals across the lifespan, (3) included qualitative and/or quantitative evaluation methods and measures, (4) were published in the English language, (5) had no start date limit, (6) included any type of study design, (7) included participants with symptoms of BD or diagnosed with BD according to International Classification of Diseases-10, Diagnostic and Statistical Manual of Mental Disorders-IV, or Diagnostic and Statistical Manual of Mental Disorders-5, (8) included evaluation of the functionality of the MHA and/or evaluation of the participant outcomes, and (9) included any function (eg, screening, mood monitoring, or medication adherence). Studies were excluded if they (1) were based on a web-based intervention with no MHA counterpart, (2) included MHAs that were only psychotherapeutic intervention specific with no evaluation, (3) were based on MHA development only, and (4) included a participant population without symptoms of BD or not diagnosed with BD. Where systematic review papers were identified, these were not included. However, their reference lists were hand searched to identify primary articles relevant for inclusion.
Given that the aim of the study was to recognize the scope of research already conducted, both qualitative and quantitative research designs were included. As few studies focused on children and adolescents as their participant population, no age limits were applied.
The search was completed by 2 researchers (IT and PK), who independently screened articles by the title and abstract against the inclusion criteria. Articles that fulfilled the inclusion criteria were then subjected to full-text screening by IT and PK. Conflicts were discussed with a third researcher (EBM) to reach consensus. Eight conflicts arose altogether, including 6 when screening the title, 1 when screening the title and abstract, and 1 when screening the full paper.
Characterization of the data and the results were exported into a customized data extraction form that was piloted in a subset of included studies. Data extracted included study name, authors, year, country, study design, MHA for BD, whether the MHA for BD was independent or adjunctive, sample size, mean age of the participants, gender of the participants, inclusion/exclusion criteria for the participants, results, tools used, measures used, time points, and whether it addressed any of the 4 objectives.
EC and IT analyzed the data using narrative synthesis and placed this in the context of the current literature to formulate conclusions. The studies were assessed using the Mixed Methods Appraisal Tool (MMAT) [
The original database search yielded 4051 articles. Hand searching of relevant review articles was conducted, which yielded a further 5 articles. After duplicates were removed, 3730 articles remained. Screening of the title and abstract resulted in 3642 articles being excluded. The remaining 88 articles were then subjected to full-text screening, and 76 articles were excluded (71 due to a lack of focus on BD, 2 could not be located, 1 only focused on app development, 1 did not diagnose participants according to our specified criteria, and 1 had no app evaluation).
Overall, 12 studies were identified as part of this review, which evaluated 7 MHAs for BD.
Assessment of the quality of all 12 studies (quantitative and mixed methods) was performed (IT and EBM) using the MMAT (version 2018) [
Six studies examined the self-perceived participant usability and functionality of the MHAs for BD. This examination ranged from detailed feedback questionnaires given to the participants [
Two studies recognized that technical problems were likely to arise and so implemented a system to solve these problems. Hidalgo-Mazzei et al [
A variety of validated outcome measures were used to evaluate the selected MHAs for BD. The most widely employed measure was the Young Mania Rating Scale (n=8) [
Eleven of the 12 studies presented a debate on the confounding factors affecting the efficacy of MHAs for BD. These confounding factors and the number of times they were mentioned in the 11 studies are shown in
Identified potential confounders for mental health app efficacy.
Potential confounder | Number of studies in which mentioned |
Participants were mainly stable or euthymic | 4 |
Participant insight when experiencing a manic phase varies | 3 |
Sample size was too small | 3 |
Length of study was too short | 3 |
Low retention or adherence rate | 2 |
Method of objective data collection was not robust enough | 2 |
Patients were found to be capable of experiencing both manic and depressive symptoms concurrently | 1 |
Questionnaires given were too simplistic | 1 |
Opportunity of free-text input not given | 1 |
Error in the app | 1 |
Change in mobile phone communication habits | 1 |
Low prevalence of manic symptoms | 1 |
Potential sampling bias | 1 |
Order of questions in the questionnaire did not vary and so was open to mindless input | 1 |
Scales not delivered often enough | 1 |
Participants switched the mental health app (MHA) off during the study | 1 |
Participants may have chosen not to complete the surveys due to their mood | 1 |
Subjective scales | 1 |
The MHA gave daily confrontation with depressive symptoms | 1 |
The MHA was not sensitive enough to manic or depressive symptoms | 1 |
Participants were already involved in a medication adherence intervention | 1 |
Only 5 of the 12 studies commented on the future of the evaluation of MHAs for BD. Streicher et al [
Osmani et al [
Schwartz et al [
Faurholt-Jepsen et al [
The aim of this scoping review was to better understand how MHAs for BD are being evaluated, particularly in terms of the process of use and outcome measures. Due to the scarcity of studies evaluating MHAs for BD specifically, inferences for discussion have been assumed from studies evaluating general MHAs. This relies on the assumption that the functions are similar.
The need for effective and diligent evaluation of MHAs for BD is well established in the literature; no credentialing is currently required for their development and release. Karcher et al [
One barrier in the development of MHA evaluation systems is the adherence rate. O’Connell [
Interestingly, one reason for the low utilization of MHAs for BD may be decreased motivation, which is often a key feature of depressive episodes [
Torous et al [
Torous et al [
As well as employing participative engagement in MHA development, improving user awareness can come from creative measures, such as describing or advertising the MHA appropriately. Over a quarter of MHAs for depression failed to mention depression in the title or description [
Moving to the user-identified priority of safety [
From a more pragmatic standpoint, MHA cost may be a factor in choosing the right MHA for BD, with 76% of people surveyed reporting interest in using their mobile phones for mental health monitoring and self-management if the MHA was free of charge [
This review had its own limitations. Only 7 MHAs for BD were evaluated, somewhat limiting the generalizability of the results of this review. Furthermore, only 5 studies commented on the future of the development and evaluation of MHAs for BD.
The studies in our review focused on patient monitoring as an indicator for process and outcome evaluation in MHAs for BD. They based their conclusions on whether the app improved assessment scores rather than interviewing patients on their experience of using the app. Although this is suggested to be a reliable way of measuring the process and outcome values, as modern medicine shifts to holistic patient-centered care, more emphasis should be put on users’ experiences rather than quantitative outcomes. In the long term, this will make patients feel respected and involved in the design of MHAs for BD, increasing adherence rates in both the short and long term.
Personalized medicine is a rapidly emerging movement in the field of health care. It is defined as a move away from the “one size fits all” approach to treatment, with new approaches and targeted therapies allowing for flexibility in the management of diseases. With this in mind, more MHAs for BD should be easily available in order to encourage patient choice and freedom to choose an MHA that is best suited to them. At the moment, MONARCA dominates the market, reducing the range and scope of MHAs for BD. As NHS England suggests [
The field of MHAs for BD shows promise in both improving patient care and creating a more cost-effective health care service [
MEDLINE and PsycINFO searches.
Study characteristics.
Study designs and outcomes.
bipolar disorder
Digital Technology Assessment Criteria
mental health app
Mixed Methods Appraisal Tool
National Health Service
IT: acquisition of data, analysis and interpretation of data, drafting the article, and final approval of the version to be published. EC: acquisition of data, analysis and interpretation of data, drafting the article, and final approval of the version to be published. KG: design of the study, revising the manuscript critically for important intellectual content, and final approval of the version to be published. PK: design of the study, acquisition of data, analysis and interpretation of data, revising the manuscript critically for important intellectual content, and final approval of the version to be published. JS: concept of the study, revising the manuscript critically for important intellectual content, and final approval of the version to be published. EBM: concept and design of the study, interpretation of the data, revising the manuscript critically for important intellectual content, and final approval of the version to be published. ANS: concept and design of the study, revising the manuscript critically for important intellectual content, and final approval of the version to be published.
None declared.