Methods and Evaluation Criteria for Apps and Digital Interventions for Diabetes Self-Management: Systematic Review

Background: There is growing evidence that apps and digital interventions have a positive impact on diabetes self-management. Standard self-management for patients with diabetes could therefore be supplemented by apps and digital interventions to increase patients’skills. Several initiatives, models, and frameworks suggest how health apps and digital interventions could be evaluated, but there are few standards for this. And although there are many methods for evaluating apps and digital interventions, a more specific approach might be needed for assessing digital diabetes self-management interventions. Objective: This review aims to identify which methods and criteria are used to evaluate apps and digital interventions for diabetes self-management, and to describe how patients were involved in these evaluations. Methods: We searched CINAHL, EMBASE, MEDLINE, and Web of Science for articles published from 2015 that referred to the evaluation of apps and digital interventions for diabetes self-management and involved patients in the evaluation. We then conducted a narrative qualitative synthesis of the findings, structured around the included studies’quality, methods of evaluation, and evaluation criteria. Results: Of 1681 articles identified, 31 fulfilled the inclusion criteria. A total of 7 articles were considered of high confidence in the evidence. Apps were the most commonly used platform for diabetes self-management (18/31, 58%), and type 2 diabetes (T2D) was the targeted health condition most studies focused on (12/31, 38%). Questionnaires, interviews, and user-group meetings were the most common methods of evaluation. Furthermore, the most evaluated criteria for apps and digital diabetes self-management interventions were cognitive impact, clinical impact, and usability. Feasibility and security and privacy were not evaluated by studies considered of high confidence in the evidence. Conclusions: There were few studies with high confidence in the evidence that involved patients in the evaluation of apps and digital interventions for diabetes self-management. Additional evaluation criteria, such as sustainability and interoperability, should be focused on more in future studies to provide a better understanding of the effects and potential of apps and digital interventions for diabetes self-management.


Introduction
As the number of people with diabetes continues to rise worldwide [1], the need to increase patients' self-management skills is crucial to improve clinical outcomes and reduce health-related costs [2,3]. There is growing evidence that apps and digital interventions such as websites (web), social media, and other online services have a positive impact on diabetes self-management [4][5][6][7][8][9][10][11][12], suggesting that standard self-management could be supplemented by digital interventions to aid and improve patients' skills [4][5][6][7][8][9][10][11][12]. While some apps and digital interventions have benefited patients, not all of them seem to be based on research, and some of these digital interventions could even compromise the safety of patients with diabetes [13].
To improve diabetes self-management with apps and digital interventions, the World Health Organization and the European Commission [14,15] deem it necessary that the available apps and digital interventions are accurate and reliable. Several initiatives, models, and frameworks suggest how some of these apps and digital interventions could be evaluated [16][17][18][19]. These approaches commonly name background information, privacy and security, evidence on the provided information, ease of use, or interoperability as issues that need to be addressed [16][17][18]. Regarding how to evaluate these criteria, several methods of different complexity have been proposed. These include simple questions to be answered by health care professionals (HCPs) and patients, whereas more complex methodology approaches, such as laboratory-based testing, field testing, and N-of-1 design, are used by researchers [18,20]. Although the aforementioned issues are relevant for diabetes self-management apps and digital interventions, a more specific approach is needed for assessing the growing number and rapidly changing functionalities of these digital diabetes self-management interventions.
Another relevant issue is who should be involved in these evaluations. As patients are often required to make critical decisions based on their own generated health information [21], people with diabetes should be involved in these evaluations. However, a previous assessment of digital health interventions demonstrated limited consideration of user perceptions, and also that of health care personnel [22].
In this systematic review, we identify the specific methods and evaluation criteria that were used to assess apps and digital interventions for diabetes self-management. We also report how patients were involved in these assessments.

Methods
This review followed the PRISMA approach [23], and its systematic review protocol is registered in PROSPERO (Registration number: CRD42018115246).

Data Sources and Search Strategy
We performed a single data search in June 2018. The search strategy covered all studies that assessed diabetes self-management apps and digital interventions, involved patients, and were published in English after 2015. We chose a short search period to get a rapid overview of the most recent methods and evaluation criteria. The search strategy covered the following databases: CINAHL, EMBASE, MEDLINE, and Web of Science. The full search strategy is available in Multimedia Appendix 1.

Inclusion and Exclusion Criteria
We included articles for review if they were (1) primary studies referring to the evaluation of apps or digital interventions for diabetes self-management; and (2) involved patients in the evaluation.
Article were excluded if (1) the evaluation only measured medical values (ie, weight, glycated hemoglobin [HbA 1c ], blood glucose); (2) it was not a primary study; (3) it did not focus on apps or digital interventions for diabetes self-management; (4) the full-text was not available; (5) it was not a peer-reviewed publication; (6) it was not in English; or (7) it was published before 2015.

Eligibility and Data Collection Procedure
We uploaded all references captured by the search strategy to Rayyan and EndNote and removed duplicates. The eligibility of the articles was assessed in two stages. In the first stage, 2 independent reviewers (PR and EG) examined all titles and abstracts. Eligibility doubts were discussed and agreed with a third and fourth reviewer (KA and EÅ). In the second stage, the full texts of the selected articles were carefully examined by 2 independent reviewers (PR and EG) to confirm their eligibility.
Two reviewers (PR and MB) independently extracted and recorded the data from these articles on an Excel spreadsheet (Microsoft). We extracted the following information from each article: type of platform, targeted health condition, study population, methods of evaluation, and evaluation criteria. Incongruences with the data extraction were discussed among the research group.

Confidence in the Evidence and Risk of Bias Assessment
Two reviewers (EG and KA) assessed the confidence in the evidence and risk of bias of the articles. We used an approach based on the CERQual guidelines [24] to assess the confidence in the evidence of the qualitative primary studies, by evaluating their methodological limitations, relevance, and adequacy. We followed the GRADE guidelines [25] to assess mixed-methods studies, quantitative studies, and randomized trials.

Strategy for Data Synthesis
We provide a narrative qualitative synthesis of the findings from the included articles, structured around confidence in the evidence and risk of bias; type of platform (apps, web, or multiplatform [ie, ≥2 types of platform delivering the same intervention in a study]); targeted health condition (type 1 diabetes [T1D], T2D, gestational diabetes mellitus, both T1D and T2D, and unspecified diabetes type); methods of evaluation (questionnaires, interviews, user-group meetings, health measures, system usage analysis, or other); and evaluation criteria (usability, clinical impact, cognitive impact, behavioral impact, feasibility, engagement, acceptability and acceptance, or security and privacy).

Identified Evaluation Methods
The methods of evaluation were grouped into 6 categories: questionnaires, interviews, user-group meetings, health-related measures, system usage analysis, and other measurements. We also identified 20 specific methods that were either used once or multiple times by the studies during the evaluation process.
The interrater agreement for the methods of evaluation was found to be κ=0.550, which represents a moderate agreement [57]. A summary of the specific methods of evaluation and studies that used them is presented in Table 1.

Identified Evaluation Criteria
The evaluated criteria were grouped into 8 categories: usability, clinical impact, cognitive impact, behavioral impact, feasibility, engagement, acceptability and acceptance, and security and privacy. The included studies evaluated one or several of these identified criteria. The interrater agreement (κ) for the evaluation criteria was found to be 0.563, which represents a moderate agreement [57].
Qualitative and mixed-method studies that used thematic analysis in their evaluation focused mostly on usability as an evaluation criterion. Three of the studies considered of high confidence in evidence were qualitative and mixed-method studies. Of these, 2 evaluated cognitive impact [36,52] and usability [27,52], and 1 evaluated engagement [36]. Figure 3 shows the number of studies that used each of the specific methods to evaluate the identified criteria. It illustrates that several methods were used to evaluate one criterion in a single study. Likewise, some studies evaluated several criteria using one or more of the identified methods of evaluation. For example, of the 31 included studies, 9 [31,33,36,40,42,43,45,50,54] evaluated cognitive impact using standardized questionnaires.

Summary of the Findings
This review aimed to identify the existing methods and criteria used to assess apps and digital diabetes self-management interventions that involved patients in their evaluations. A total of 31 articles were included in the review, 7 of which were considered of high confidence in the evidence [27,33,36,43,51,52,54]. More than half of the studies (18/31, 58%) focused on the evaluation of apps for diabetes self-management, and 12 of the 31 studies addressed T2D. The most commonly used methods of evaluation were questionnaires, interviews, and user-group meetings. The most used evaluation criteria to assess apps and digital interventions for diabetes self-management were cognitive impact, clinical impact, and usability.

Specific Evaluation Criteria and Diabetes Patients' Assessment
In our review, we have found that studies dealing specifically with apps and digital interventions for diabetes self-management focus on the evaluation of more technology-related and users' interaction aspects (ie, acceptability and acceptance, and engagement). In addition, these studies focus on the impact that these digital self-management interventions have on the individual. Behavioral impact, cognitive impact, and clinical impact were used as relevant criteria for assessing all types of digital interventions for diabetes self-management. It is vital to measure the interventions' impact on their users because those that have shown benefits related to behavioral, cognitive, and clinical impact could reduce health-related costs [2,3].
Evidence shows that involving individuals in the assessment of different health interventions has a positive impact on health [58]. We found few articles (n=31) in this review that involved patients in the evaluation of apps and digital interventions for diabetes self-management. The evaluations in which patients were involved in mostly focused on usability and cognitive impact. Evaluation criteria that could measure patients' continuous use of these apps and digital interventions for self-management could supplement both their qualitative responses and the more static traditional and clinical criteria. This is an opportunity for improvement, as none of the studies in this review evaluated the same criterion using both qualitative results from patients and quantitative measures.
Involving patients with diabetes in assessing apps and digital self-management interventions, and obtaining their feedback regarding additional evaluation criteria could also increase our knowledge about the features that support engagement with these technologies. This could also help create better digital health interventions that encourage more continuous and effective use [59]. The most common methods of evaluation with the patients were questionnaires, interviews, and user-group meetings. Simple methods such as these elicit the opinion and perceptions of users, as well as encourage them to critically analyze self-management apps and digital interventions. Therefore, such methods should be used in conjunction with complex methods used by researchers and developers [18,20], especially to measure the same criterion.

Improving Reported Evaluations of Digital Interventions for Diabetes Self-Management
Apps and digital health interventions have evolved quickly. Yet, compared with other sectors, the health industry seems to be behind with regard to digitalization [60]. Currently, most apps and digital interventions for self-management are not recommended as part of the treatment plan, maybe because their design and development do not take into consideration sustainability [61]. In fact, digital health interventions rarely advance beyond a pilot phase [62,63], or the duration of an intervention study.
In 2016, the mobile health (mHealth) evidence reporting and assessment checklist was developed by the World Health Organization to help with reporting evidence of the effectiveness of mHealth interventions [64]. The checklist recommended reporting on items that touch on sustainability, scalability, and transparency, such as infrastructure, interoperability, contextual adaptability, and replicability, which we still see are not much focused on in today's studies. Future studies should also consider these evaluation criteria in addition to gender and equity issues associated with the use of apps and digital interventions for diabetes self-management.
Evaluation reports for apps and digital interventions for diabetes self-management must be standardized, as recommended by the CONSORT-EHEALTH guidelines for reporting digital health interventions [65]. The lack of standardization made it challenging to compare studies as different authors used different terminologies to describe the same evaluation criterion. For example, one study [37] used the term heuristics evaluation, which was grouped under usability because it evaluated measures such as the visibility of app status, ease of input, and readability. Likewise, another study [32] evaluated satisfaction, which falls under usability because it evaluated among others, visual attractiveness and ease of use.
As electronic health (eHealth) research is a multidisciplinary field, we assume that the authors chose these terms based on the various educational or professional backgrounds and the corresponding target audiences. By following the World Health Organization classification of digital health interventions [66], terminologies related to the evaluation of apps and digital interventions for diabetes self-management could be standardized to facilitate straightforward interpretation and aggregation of research evidence.

Association Between Methods Used and Criteria Evaluated
In our review we have found that there was an almost even split of studies that used standardized questionnaires, author-created questionnaires, and semistructured interviews to evaluate usability. Our results are to some extent in line with the findings of a previous review that found that usability was mainly assessed though polls and questionnaires [67]. The usability of a digital self-management intervention is crucial to its successful adoption, its acceptance, and the individual's engagement with it. In addition, we found that cognitive impact was often assessed not only through standardized questionnaires, but also through semistructured interviews.
Comparing the methods for the evaluation of usability with those for the evaluation of cognitive impact, we identified that it was more common to use author-created questionnaire for usability. A possible explanation might be the wide variety of intervention delivery platforms (eg, different types of apps and online resources) that might create different evaluation needs not captured in existing standardized usability questionnaires. Another explanation might be the different research traditions in different disciplines. Usability might be more often a concern of computer science researchers, whereas cognitive impact a concern of health researchers and professionals.
Finally, health outcomes were almost exclusively evaluated by medical tests, showing the preference of health researchers and professionals in using standardized tests to determine the impact of digital interventions. Several other methods can be used to evaluate multiple criteria; however, depending on the aim and the type of study, researchers must endeavor to exhaust all available methods to ensure consistency of results.

Feasibility of Using Digital Self-Management Interventions in Clinical Workflow
Although most apps and digital health interventions are intended for self-management, some of them also provide access to the health care system, such as communication with HCPs and electronic health journals. The reviewed studies consistently reported that this is in response to patients' interest in being able to contact their HCPs or share results (eg, their blood glucose results with their health care team). This was the case not only within our review [35,36,42,44,47,48,56] but also by industry research groups [68,69]. This implies the potential and expectation for further involvement of HCPs in patients' use of apps and digital interventions for diabetes self-management.
Several studies, including many in this review, have shown that involving HCPs in digital interventions is associated with improved self-management of diabetes and the success of these interventions [31,48,49,52,[70][71][72]. Therefore, studies focusing on apps and digital interventions for diabetes self-management should evaluate the possibilities of effortlessly integrating these interventions in the workflow of HCPs-the connection and interaction with electronic health journals and other existing health information systems. Such an integration can be achieved by evaluating the infrastructure needed for digital self-management interventions [64].

Limitations and Strengths
The search for articles covered a short period (2015-2018) and focused on articles published in the English language. Therefore, we may have missed relevant studies that reported additional evaluation methods or evaluation criteria. Our interrater agreement of the data extraction was only moderate; however, all incongruences were discussed among the research group. Our findings have provided a useful overview of the recent evaluation methods and criteria that researchers are using to assess current apps and digital interventions for diabetes self-management. Furthermore, our review included both quantitative and qualitative studies which provided a better characterization of different evaluation methods and criteria that are being used to assess digital diabetes self-management interventions.

Conclusions
There are only few studies that involved patients in the evaluation of apps and digital interventions for diabetes self-management, and even fewer still considered of high confidence in the evidence. The most common evaluation methods were questionnaires, interviews, and user-group meetings, whereas evaluation criteria were cognitive impact, clinical impact, and usability. Studies with high confidence in the evidence did not evaluate feasibility or security and privacy, neither were patients involved in evaluating the latter criterion which was evaluated in only 2 [29,44] of the included studies.
It is important to the successful implementation and continuous use of apps and digital interventions for diabetes self-management that patients are involved in evaluating every criteria. In that way, they can contribute to the development and modification of these digital interventions to better meet their specific self-management needs. Furthermore, the methods and criteria evaluated in digital diabetes self-management interventions should be expanded to assess and ensure sustainability and interoperability. In addition, studies should evaluate the association between cognitive, clinical, and behavioral impact of these apps and digital interventions, and health-related costs for individuals with diabetes. This could help improve health care associated with the management of diabetes and promote the incorporation of apps and digital interventions for self-management in the services provided at health care facilities.

Acknowledgments
This project is funded by Helse Nord (HNF1425-18). The coauthors acknowledge the advice of the project's Advisory board: Professor Gunnar Hartvigsen, Anne Grethe Olsen MD, and Dr. Med. Anne Helen Hansen, and also the support and contributions of Per Erlend Hasvold in his role as an "internal reviewer." Furthermore, we thank Dr. Steven Bradway for his assistance with the coarse data extraction and organization at the start of this review. The publication charges for this article have been funded by a grant from the publication fund of UiT The Arctic University of Norway.

Authors' Contributions
KA was responsible for database searching; EG, EÅ, KA, and PR were responsible for title, abstract, and full-text screening; MB and PR performed independent data extraction; EG and KA evaluated risk of bias; and DL, EG, EÅ, MB, KA, and PR performed data analysis and interpretation. All the coauthors contributed to drafting and revising the review. All coauthors approved the final version of the manuscript.

Conflicts of Interest
None declared.