Compliance With Mobile Ecological Momentary Assessment of Self-Reported Health-Related Behaviors and Psychological Constructs in Adults: Systematic Review and Meta-analysis

Background Mobile ecological momentary assessment (mEMA) permits real-time capture of self-reported participant behaviors and perceptual experiences. Reporting of mEMA protocols and compliance has been identified as problematic within systematic reviews of children, youth, and specific clinical populations of adults. Objective This study aimed to describe the use of mEMA for self-reported behaviors and psychological constructs, mEMA protocol and compliance reporting, and associations between key components of mEMA protocols and compliance in studies of nonclinical and clinical samples of adults. Methods In total, 9 electronic databases were searched (2006-2016) for observational studies reporting compliance to mEMA for health-related data from adults (>18 years) in nonclinical and clinical settings. Screening and data extraction were undertaken by independent reviewers, with discrepancies resolved by consensus. Narrative synthesis described participants, mEMA target, protocol, and compliance. Random effects meta-analysis explored factors associated with cohort compliance (monitoring duration, daily prompt frequency or schedule, device type, training, incentives, and burden score). Random effects analysis of variance (P≤.05) assessed differences between nonclinical and clinical data sets. Results Of the 168 eligible studies, 97/105 (57.7%) reported compliance in unique data sets (nonclinical=64/105 [61%], clinical=41/105 [39%]). The most common self-reported mEMA target was affect (primary target: 31/105, 29.5% data sets; secondary target: 50/105, 47.6% data sets). The median duration of the mEMA protocol was 7 days (nonclinical=7, clinical=12). Most protocols used a single time-based (random or interval) prompt type (69/105, 65.7%); median prompt frequency was 5 per day. The median number of items per prompt was similar for nonclinical (8) and clinical data sets (10). More than half of the data sets reported mEMA training (84/105, 80%) and provision of participant incentives (66/105, 62.9%). Less than half of the data sets reported number of prompts delivered (22/105, 21%), answered (43/105, 41%), criterion for valid mEMA data (37/105, 35.2%), or response latency (38/105, 36.2%). Meta-analysis (nonclinical=41, clinical=27) estimated an overall compliance of 81.9% (95% CI 79.1-84.4), with no significant difference between nonclinical and clinical data sets or estimates before or after data exclusions. Compliance was associated with prompts per day and items per prompt for nonclinical data sets. Although widespread heterogeneity existed across analysis (I2>90%), no compelling relationship was identified between key features of mEMA protocols representing burden and mEMA compliance. Conclusions In this 10-year sample of studies using the mEMA of self-reported health-related behaviors and psychological constructs in adult nonclinical and clinical populations, mEMA was applied across contexts and health conditions and to collect a range of health-related data. There was inconsistent reporting of compliance and key features within protocols, which limited the ability to confidently identify components of mEMA schedules likely to have a specific impact on compliance.


Background
Ecological momentary assessment (EMA) is a survey method that allows collection of data on participant behaviors, affect, and perceptual experiences in real-time (momentary) and real-life environments (ecological) [1]. In its original form, EMA required pen and paper diaries or logs to be completed on random (signal) or fixed (interval) time-based schedules or in response to a specific target behavior, psychological or social event (event-based). With the advent of handheld technologies, mobile EMA (mEMA) and increasingly mobile ecological momentary interventions (mEMIs) can be completed through automated schedules via handheld devices such as tablets and mobile phones.
As mEMA or mEMI have the potential to capture data in real time, the level of recall bias is potentially reduced. In addition, contextual (where and who the respondent is with) and antecedents to the specific target behavior or psychological construct can be obtained [1,2]. As a survey approach, mEMA or mEMI has undeniable utility, but data are dependent on participants consistently responding to the mEMA or mEMI schedule (compliance) [3]. Although electronically delivered surveys to personal mobile devices provide a means of time or date stamping and limit the possibility of hoarding, back and forward filling [4], concerns have been raised about protocol burden, missing data (especially if systematic), mindless answering, and survey habituation when lengthier questionnaires can be circumvented by a no response to initial questions [2]. EMA data with low compliance rates are unlikely to be ecologically valid; however, it is also possible to have good individual compliance with data of questionable accuracy [5,6].
Although Stone and Shiffman [17] have highlighted the need for explicit reporting of compliance in their original reporting guidelines for EMA, recurring issues relating to the reporting of compliance include (1) missing, incomplete, or ambiguous data; (2) heterogeneity in reporting; (3) impact of data exclusions; and (4) combining traditional (paper-based) and mEMA data [5]. Participant compliance with mEMA or mEMI-in theory-is related to the total protocol burden, which is a function of monitoring duration, frequency and complexity of prompts, and familiarity with the technology. However, as Jones et al [5] note, to date, there is little compelling, systematic evidence to support an association between EMA burden and compliance rates. These issues make it difficult to determine which, if any, features of EMA protocols positively or negatively influence compliance to EMA schedules.
The purpose of this systematic review is to guide the development of an mEMA protocol, which could be used for future studies of health-related behaviors and psychological constructs (including symptoms) in adults with and without chronic disease. The primary question for this systematic review is as follows: In adult nonclinical and clinical populations, which factors are associated with increased compliance to mEMA protocols for collection of health-related behaviors and psychological constructs (including symptoms)?

Objectives
The objectives of this systematic review were to describe: 1. Health-related behaviors and psychological constructs assessed using mEMA 2. mEMA protocol and compliance reporting 3. Associations between key components of mEMA protocols and participant compliance

Search Registration
The search strategy and review protocol were registered prospectively with the International Prospective Register of Systematic Reviews (PROSPERO 2016: CRD42016051726).

Eligibility
Observational studies (cohort, cross-sectional) of mEMA in adults (>18 years of age) were eligible for inclusion in this review if these (1) reported participant compliance with mEMA; (2) were a primary study published in English between 2006 and 2016 inclusive; (3) included adults (≥18 years) either apparently healthy (nonclinical population) or with health conditions (clinical population); and (4) collected mEMA data using mobile devices as a primary or secondary outcome. References were excluded if these were (1) experimental designs investigating intervention efficacy; (2) duplicate publications or secondary analysis of the same data set; or (3) conference abstracts, protocols, commentaries (editorials or letters), or systematic or narrative reviews.

Information Sources and Search Strategy
A range of electronic databases were searched to identify eligible studies: AMED (Allied and Complementary Medicine), CINAHL, Cochrane Library and CENTRAL (Cochrane Central Register of Controlled Trials), Embase, MEDLINE (including epub ahead of print), PsycINFO, Scopus, and Web of Science. An academic librarian (Carole Gibbs, University of South Australia) assisted with the development of the search strategy regarding conceptualization, operators (operational terms), and limiters [18] with the final search undertaken during a single week. Search terms and associated MeSH (Medical Subject Heading) alternatives, which were adapted for use in all databases, related to the population (adults), assessment (mEMA), and outcomes of interest (health behaviors, perceptual experiences including symptoms, affect or mood). Key search terms included "ecological momentary assessment," "EMA," "mobile ecological momentary assessment," "mEMA," "electronic diary," "SMS or short message service," "prompting," "text messaging," "health behaviour," "symptom," and "adult." Reference lists of included studies and systematic reviews identified during the search were reviewed to identify additional potentially relevant studies.

Study Selection
The titles and abstracts of studies identified from the search process were screened against a priori eligibility criteria and full-text versions imported into Covidence (Covidence systematic review software, Veritas Health Innovation). Both screening steps were undertaken by individual members of the research team working in pairs (AG and MW, HL and FF) with each person completing the task independently, before meeting with their partner to compare results and resolve disagreements (consensus).

Data Collection
A data extraction template was prospectively developed; it was guided by the Checklist for Reporting EMA studies proposed by Liao et al [10] and pilot-tested on 5 randomly selected eligible studies. Working in pairs (AG and MW, JI and KF, HL and FF), individual members of the research team extracted all data before meeting with their partner to compare results and resolve disagreements by discussion. As this review aims to describe the features of mEMA schedules associated with increased mEMA protocol adherence, assessment of methodological bias was not planned.

Data Items
Data were extracted across 4 domains: Publication demographics: title, authors, year of publication.
Participants: recruitment source, medical condition or diagnosis (clinical populations), sample size (enrolled, attrition or withdrawn and included in analysis), and age (mean/median, SD).
mEMA protocol: target behavior or psychological construct, mobile device type (PDA, palmtop computer, electronic diary, mobile or smartphone, tablet, other), participant training (yes/no), provision of incentives (course credit, financial, other, or none), incentive thresholds (yes/no) monitoring duration (days), prompt type (random signal, interval, event-based), frequency per day, number of questions/items per prompt type (reported or estimated from information reported in studies), strategy to deal with unanswered prompts, and time allowed for survey response. Where authors did not report the number of items per prompt type, but rather included descriptions of standardized instruments which were converted to mEMA survey items, a full version of the standardized instrument was accessed, and number of items calculated.
mEMA compliance: verbatim (or where possible calculated from reported data), participant completion (number included in analysis, data exclusions), criteria/thresholds for mEMA data, number of prompts delivered/answered per person/cohort (planned, actual, average, range), and response latency as time (mean, SD) [8,10].

Data Management
Data were tabulated to provide descriptive summaries. The mEMA surveys commonly included multiple questions reflecting behavioral or psychological constructs. Although the authors of mEMA studies did not always specify the primary outcome for these observational studies, most studies explicitly reported the key variable of interest for mEMA, which we interpreted to be the primary mEMA target. Where other data were also collected by the same mEMA survey, we denoted those as secondary mEMA targets. The primary mEMA target of studies was identified, and studies were grouped and reported according to two broad domains: (1) behavior (eg, dietary, physical activity, and smoking) and (2) psychological construct (eg, affect, cognition, and sensations/symptoms). For each domain, a narrative synthesis was used to summarize participants, mEMA protocol, and compliance data for nonclinical and clinical data sets.
With the exception of device type, where possible, we adopted the operationalization of variables common to Wen et al [9] or Jones et al [5] unless the distribution of our data resulted in very unbalanced cells or our data could provide greater resolution. Potential mEMA protocol factors related to compliance were categorized for analysis. Given ongoing concerns about the burden imposed by EMA schedules and compliance, in addition to these individual factors, we explored a novel composite metric to reflect aspects previously identified as possible contributing factors (monitoring duration, frequency, type, and complexity of prompts).
Where possible, a mEMA burden score was calculated for each study by multiplying: • the total monitoring duration in days (d; all days included in all waves) • by the maximum frequency of time-based prompts (random and interval) per day (f) • by the minimum number of compulsory questions/items within all prompts per day (i) and • by a weighting reflecting the number of prompt types scheduled per day (w; eg, time-based [signal or interval] and/or event-based) with each prompt type weighted as 1 (min weight=1, max=3).
For example, the mEMA burden score for a 14-day monitoring schedule (d), where 5 random signal prompts were delivered per day (f), with each prompt requiring responses to a minimum of 12 items/questions (I; 60 items in total per day), would be 840. If event-based prompts (irrespective of the number of items within the prompt) were added to this schedule (w), the burden score would rise to 1680. Burden scores were calculated and reported in quartiles: 0 to 283.5, 284 to 810, 811 to 1806, or ≥1807.

Meta-analysis
Random effects restricted maximum likelihood estimator meta-analyses were undertaken using the approach reported by Jones et al [5] and Wen et al [9], with both authors advising to assist in accurate replication. All statistical analyses were conducted using JASP (Jeffreys's Amazing Statistics Program, version 0.9.2; 2019). Studies were included in the meta-analysis if they reported all data necessary for the meta-analysis procedure and cohort compliance (%) could be extracted before data exclusions when possible. Sensitivity analysis was conducted to explore the impact of compliance rates reported before and after data exclusion. The effect sizes (ESs) were calculated by logit transforming the proportion of completed prompts (ie, compliance rates; proportion/[1−proportion]). SEs were then estimated using the following equation: Where, n is the sample size and p is the proportion.
To adjust for clustering within participants, the SE was adjusted by the effective sample size (ESS). The ESS equation is as follows: Where, k is the number of study prompts, n is the participant number, ICC is either the reported intraclass correlation coefficient (ICC) or the SD of reported compliance, and p is the proportion of completed prompts.
For studies that did not report SD data, sensitivity analyses were conducted by computing the SEs using the 25 and 75 percentiles of available SDs. The sensitivity analyses did not show any differences. Therefore, analysis used imputed median SD (where the original SD was not reported). To aid interpretation, inverse logit transformation was conducted to enable reporting of proportions. The I 2 statistic was used to quantify heterogeneity across the ES. Pooled compliance rates were initially explored for combined nonclinical and clinical data sets and then compared between nonclinical and clinical studies.
To explore the relationships between the pooled compliance rates (nonclinical and clinical data sets) and EMA protocol factors (ie, monitoring duration, prompt frequency, device type, training, incentives, and burden score), random effects analysis of variance was conducted as part of the meta-analysis program.
Moderator analyses were conducted separately for nonclinical and clinical pooled compliance.

Results
Overview Figure

Objective 1: Health-Related Behaviors and Psychological Constructs Assessed With mEMA
Using the primary mEMA target, data sets were grouped into 2 broad domains: Behavior or Psychological construct. Within the Behavior domain, the Other category reflects single studies (7), where the primary mEMA target did not align with more common behavior targets (social interactions/activities [26,27], sexual [28], leisure [29], nonsuicidal self-injurious [30], HIV prevention [31], and oral behaviors) [32].
The most frequent primary mEMA target across all domains for nonclinical and clinical data sets was affect ( [34] or separate tasks [35], number of completed protocol days [36], total number of prompts (data) available [37], or proportion of completed questions/items per prompt [38]. Cohort compliance reported before data exclusions ranged from 38% to 98% (median 82%) and after data exclusions from 50% to 97% (median 81%; Table  1).

Question 3: Associations Between Key Features of mEMA Protocols and mEMA Compliance
Of the 105 data sets included in this review, 65% reported sufficient data for inclusion in the meta-analysis (n=68 data sets: 41 . Three studies included more than 1 data set and reported compliance ESs for each (data sets n=2 [23], n=3 [20], and n=4 [21]). Sensitivity analysis was undertaken to explore the impact of double counting of mEMA protocol factors within the meta-analysis, where multiple ESs were reported within single studies. When a single ES was retained for each of these studies (lowest ES of the 2 [23], median of 3 [20], ES closest to the average for 4 [21]), the pooled 62 ESs (81.3%, 95% CI 78.2-84.2) and reported variance (I 2 =98) were essentially the same as the full data set (68 ESs: 81.9%; 95% CI 79.1-84.4; I 2 =98). To ensure that subgroup analysis was not affected, all analyses were conducted without duplicate ESs, and all relationships were consistent with those of the full data set.
For nonclinical studies, 2 factors (prompt frequency and items/prompt) were significantly related to mEMA compliance. For prompt frequency, the overall model was nonsignificant (P=.07), but the coefficient was significant (P<.001). Prompting 1 to 3 times per day was associated with higher compliance (87%; 95% CI 82.5-90.4) compared with studies with more than 3 prompts per day (76.9%) and 6 or more prompts per day (79.4%). The number of items per prompt was significant for both the overall model (P=.04) and the coefficient (P<.001).
For clinical data sets (n=27), no factors were significantly related to compliance. The number of items per prompt approached significance (P=.05). Compliance appeared to be lower in studies with 9.5-26 items per prompt (71.1%; 95% CI 62.5-78.6). Significant heterogeneity was reported for all significant findings (nonclinical and clinical), with I 2 values in excess of 90%, suggesting that although some variance can be explained by the significant factors, a large amount of variance remained unexplained. The burden score was not significantly related to compliance. The meta-analysis factor analysis compliance proportions are presented in Table 2.

Principal Findings
This systematic review of observational studies aimed to describe protocols and compliance with mEMA for self-reported health-related behaviors and psychological constructs in adults. Across 105 unique data sets, the key findings of this review were as follows: (1) a variety of health-related behaviors and psychological constructs were assessed, with affect being the most common mEMA target; (2) mEMA protocols varied widely across studies; (3) compliance was inconsistently reported across studies; (4) meta-analysis estimated an overall compliance rate of 81.9% (95% CI 79.1-84.4), with no significant difference between nonclinical and clinical data sets or estimates before or after data exclusions; (5) compliance was associated with prompts per day and items per prompt (nonclinical); and 6) no compelling relationship was identified between key features of mEMA protocols representing burden and mEMA compliance.

mEMA Use in Adults for Health-Related Behaviors and Psychological Constructs
The mEMA targets identified in this review reflect those reported in previous systematic reviews: affect/mood [7,12,14,15], cognitions [13], symptoms [15], eating or dietary behaviors [10,11], physical activity [10], and smoking or alcohol consumption [5,6]. Likewise, clinical populations identified in this review (psychiatric or mental health conditions, chronic pain and fibromyalgia, eating disorders, and substance use) were generally consistent with those reported previously [5,7,11,12,[14][15][16]. However, there were chronic conditions unique to this review: oral or dental health, cancer, stroke and traumatic brain injury (for each n=3, 9/41, 22%), HIV, and upper abdominal surgery (for each n=1, 2/41, 5%). The small number of studies identified for these clinical groups may suggest that the potential for mEMA has not yet been realized in these populations.

Reporting of mEMA Protocols and Compliance
Most studies included in this review provided information around the EMA protocol used (device, monitoring duration, frequency and type of prompts, provision of training, and use of incentives). Consistent with previous systematic reviews of both youth and adults, there was considerable heterogeneity across studies for EMA protocols (Multimedia Appendix 3). Heterogeneity may be expected given the various potential applications of this survey approach. The mEMA protocol required to obtain sufficient or appropriate self-reported data on daily habitual behaviors in the general population is not likely to be the same as that for obtaining self-reported data on psychological responses to events or stimuli in clinical contexts. For example, the average EMA monitoring duration for studies of nonclinical adults in this review was 7 days (range: 1-49 days) compared with 12 days (range: 1-182 days) for clinical populations and 30 days (range: 3-730 days) in a review of EMA in substance users [5]. Likewise, prompt type, frequency, and complexity are expected to differ depending on the EMA target and population. Reviews of studies of EMA for diet and physical activity (common behaviors) report a daily average prompt frequency of 20 [10] compared with less than 4 prompts per day in substance use [5]. For these reasons, in systematic reviews of EMA use-including this one-reporting of summary metrics (mean, SD, median, range) for protocol components could be interpreted as a reflection of diversity in EMA application rather than a lack of protocol standardization.
The same rationale cannot be applied to the inconsistencies identified in reporting of EMA protocol compliance. Compliance is problematic to determine for event-based prompts (eg, those completed with smoking or consumption of alcohol). Compliance for time-based notifications, especially when the EMA is conducted using mobile devices, is relatively simple (number of prompts answered out of the total number of prompts delivered). However, participants may respond to a notification but may not complete all survey items or may not respond in a timely manner, affecting the momentary aspect of the EMA. In both of these cases, the act of responding might appropriately contribute to compliance rates, but the data are unlikely to be valid. These concepts were evident in the earliest recommendations for reporting compliance in EMA studies [17], which predate the sampling frame of this systematic review (2006-2016 inclusive). Considering that 71 studies were excluded from this review because of the absence of reporting mEMA compliance, less than half of the studies included in this review complied with recommendations put forward by Stone and Shiffman [17], such as reporting the proportion of delivered prompts answered (43/105, 41%) or defining a criterion for valid EMA data (37/105, 35%). Similarly, less than half of the data sets included in this review reported an average number of prompts answered per person (44/105, 42%), as recommended by more recently published guidelines for reporting EMA [8,10].
With the growth of systematic review methodologies (meta-synthesis, meta-regression, etc), one aspect of reporting for EMA warrants further consideration. EMA allows collection of self-report data across multiple survey items reflecting a range of behavioral, psychological, and contextual factors. It is not uncommon for data collected in the original, primary study to be reported in several publications. The foci of these offspring publications may include the total original sample of participants recruited (eg, unpublished data for specific mEMA items or other variables) or explore a subset of the original study participants (eg, patterns associated with participant characteristics). Although this is a reasonable and defensible use of the original study's resources, identification of duplicate or overlapping data in studies can be problematic. Where ambiguity exists, contacting the study authors is one way to clarify which publication should be considered the primary report (and which report overlapping or duplicate data). However, this option becomes less practical as time and people move on. The alternative is for authors to include an explicit statement concerning the existence of publications that include overlapping or duplicate data. There were a number of exemplars of this aspect of reporting in studies included [67,68,96] and excluded from this review [115][116][117][118].

Associations Between Key Components of mEMA Protocols and Compliance: Meta-analysis
In our meta-analysis (68 data sets), which replicates and was guided by the authors of 2 previous meta-analyses on this topic [5,9], the overall compliance rate was 81 In our meta-analysis, for nonclinical data sets, prompt frequency per day and the number of items per prompt were significantly related to compliance (noting that it is not unusual for coefficients derived within a model to be significant even when the overall model is not). However, the findings are likely affected by the number of data sets in some categories. For nonclinical data sets, frequencies of 1-3 prompts per day were associated with small but significantly higher mean cohort compliance. Higher compliance with lower number of prompts perhaps seems intuitive, yet the evidence is inconsistent. Wen et al [9] reported opposite patterns of significance when nonclinical and clinical population data were investigated, and Jones et al [5] and Ono et al [119] reported no relationship with prompt frequency and compliance among substance users and those affected by chronic pain, respectively.
The relationship between the number of items included within each prompt and compliance has not been explored in previous systematic reviews or meta-analyses of mEMA. In this review, the number of items respondents were required to complete in a standard prompt ranged from 1 to 73 (median 10), with a greater number of items more common in the mEMA of psychological constructs (Multimedia Appendix 3). Our analysis showed an intuitive relationship with compliance among nonclinical data (ie, ≥26 items per prompt had the lowest mean cohort compliance of 63%; 95% CI 42.3-79.7), but not with clinical data.
When aiming to identify protocol factors affecting compliance, the inconsistencies in reporting of EMA compliance and the likely publication bias (studies with lower compliance rates may not be submitted or accepted for publication) must also be considered [5]. These factors limit the inclusion of potentially eligible studies in meta-analyses (68/105, 64.8% data sets in this review; 36/42, 86% studies in a previous review [9]). In addition, studies included in meta-analyses privilege best compliers through exclusion of participants not meeting criteria for valid EMA data or compliance thresholds (determined a priori or posteriori). Jones et al [5] attempted to address this latter point by exploring protocol factors associated with participant data exclusions (monitoring duration and prompt frequency). Finally, aggregate level compliance may not be sensitive enough or provide sufficient resolution to identify factors associated with higher or lower compliance. While accepting these caveats, there are 2 ways to consider the results of the 3 meta-analyses undertaken by Wen et al [9], Jones et al [5], and this study: 1. There is insufficient resolution to identify associations-if they exist-at the aggregate data level. 2. Although confidence limits might be reduced by adding further studies, the meta-analyses are essentially correct, and the notion of protocol burden imposed on participants has little to no impact on compliance [4,5].
In studies using EMA, the issue of what constitutes an acceptable rate of compliance or missing data is debatable.
Although several studies included in this review cite a criterion or commonly used threshold of 80%, we, similar to Jones et al [5], could not identify the derivation of this criterion. For authors currently planning, conducting, or writing papers or protocols on EMA to monitor health-related behaviors of psychological constructs, adequate recording and reporting of compliance data following recommendations by Liao et al [10] and Heron et al [8] should enable future meta-analyses to explore protocol factors affecting participant compliance rates.
This systematic review prospectively aimed to sample a decade of mEMA use (protocol registered in November 2016; sampling frame of 2006 to 2016) in observational studies including adults from clinical and nonclinical populations. As one of the first EMA reporting documents was published in 2002 [17], this sampling frame assumed that researchers planning or reporting studies including mEMA would be aware of these reporting recommendations. The time frame required for the uptake of EMA reporting recommendations is unknown, although estimates of the time required for uptake of translational research ranges between 2 and 17 years [120]. Our sampling frame and review, however, does not capture studies published from 2017 to date. It is possible that more recent publications differ from those included in our review (greater mobile phone use, better reporting of mEMA schedules, and compliance).
There are no universally accepted recommendations concerning the updating of systematic searches or incorporation of the newer studies into the review results. Systematic reviews-depending on the specific question and volume of studies eligible for inclusion-are time-and labor-intensive. For larger reviews, it is not uncommon for these to take >2 years [121], with updates of Cochrane Collaboration systematic reviews taking up to 3.3 years [122]. The current Cochrane Collaboration policy infers that the decision to update a systematic review should consider the importance of the review question and the volume of new information (studies) [122]. Early in the review process (postsearch completion), 2 papers were identified, published in 2016 [10] and 2017 [8], providing updated recommendations for EMA reporting. Although the volume of mEMA studies published from 2017 is substantial and growing, we opted not to undertake an updated search/meta-analysis to quarantine mEMA studies published before the availability of the more recent EMA reporting recommendations.

Strengths and Limitations
This review was strengthened by the broad eligibility criteria used, including studies across nonclinical and clinical contexts in adults. The meta-analysis method was replicated from previous studies [5,9], enabling direct comparison of findings.
To the best of the authors' knowledge, this review is the first to propose and explore burden as a compound effect of the various EMA factors (monitoring duration, prompt frequency and prompt type, item per prompt) on participant compliance. We have proposed this novel metric as a starting point for conversations, critique, and further development. In its current form, the burden metric does not include all factors likely to contribute to burden (unfamiliarity with technology, adjunctive use of wearable technologies such as accelerometers), the proposed weighting is rudimentary, and the accuracy of study design features was not confirmed by the study authors.
Limitations of this review include a search strategy focused on the use of mEMA and excluding interventions delivered using EMA (EMI). Consequently, the findings of this review should not be extrapolated or assumed to be similar in studies using EMI. Most studies included in this review provided a clear statement of the primary outcome of interest within each observational study, and we are confident that our categorization of primary mEMA targets is defensible. However, when observational studies did not clearly identify or infer a primary outcome of interest and given mEMA survey items can include multiple items for both self-reported behavioral and psychological constructs, for a small number of studies, misclassification may exist with respect to categorization of mEMA targets as primary or secondary. In the absence of explicit statements by the authors on the number of items within each standard notification, we adopted a conservative approach by estimating the minimum compulsory number of items based on either the information provided by authors within publications or reviewing the instruments reported by authors for inclusion within surveys. The impact of including only studies published in English is unknown.

Conclusions
This review suggests that there is substantial interest in the use of mEMA in adults to collect self-reported health-related behavior and psychological construct data in nonclinical and clinical contexts. Across mEMA studies, there was considerable heterogeneity in protocol design, which may reflect a concerted effort by researchers to tailor mEMA protocols for the intended target and/or population to facilitate compliance. However, the number of studies reporting participant compliance with EMA is concerning. As a result of no or underreporting of compliance, pooled compliance rates may be skewed in favor of overall higher EMA compliance rates. This may dampen associations between compliance rates and EMA protocol factors or burden, making it difficult to ascertain which, if any, protocol factors (such as prompt frequency and number of items within prompts, as identified in this analysis) improve compliance and data collection.