This is an openaccess article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Missing data are common in mobile health (mHealth) research. There has been little systematic investigation of how missingness is handled statistically in mHealth randomized controlled trials (RCTs). Although some missing data patterns (ie, missing at random [MAR]) may be adequately addressed using modern missing data methods such as multiple imputation and maximum likelihood techniques, these methods do not address bias when data are missing not at random (MNAR). It is typically not possible to determine whether the missing data are MAR. However, higher attrition in active (ie, intervention) versus passive (ie, waitlist or no treatment) conditions in mHealth RCTs raise a strong likelihood of MNAR, such as if active participants who benefit less from the intervention are more likely to drop out.
This study aims to systematically evaluate differential attrition and methods used for handling missingness in a sample of mHealth RCTs comparing active and passive control conditions. We also aim to illustrate a modern modelbased sensitivity analysis and a simpler fixedvalue replacement approach that can be used to evaluate the influence of MNAR.
We reanalyzed attrition rates and predictors of differential attrition in a sample of 36 mHealth RCTs drawn from a recent metaanalysis of smartphonebased mental health interventions. We systematically evaluated the design features related to missingness and its handling. Data from a recent mHealth RCT were used to illustrate 2 sensitivity analysis approaches (patternmixture model and fixedvalue replacement approach).
Attrition in active conditions was, on average, roughly twice that of passive controls. Differential attrition was higher in larger studies and was associated with the use of MARbased multiple imputation or maximum likelihood methods. Half of the studies (18/36, 50%) used these modern missing data techniques. None of the 36 mHealth RCTs reviewed conducted a sensitivity analysis to evaluate the possible consequences of data MNAR. A patternmixture model and fixedvalue replacement sensitivity analysis approaches were introduced. Results from a recent mHealth RCT were shown to be robust to missing data, reflecting worse outcomes in missing versus nonmissing scores in some but not all scenarios. A review of such scenarios helps to qualify the observations of significant treatment effects.
MNAR data because of differential attrition are likely in mHealth RCTs using passive controls. Sensitivity analyses are recommended to allow researchers to assess the potential impact of MNAR on trial results.
In the world of mobile health (mHealth), high and rapid attrition is the rule rather than the exception [
Attrition in research contexts typically results in missing data. Some exceptions to this may include measures that continue to be assessed regardless of ongoing study participation (eg, smartphone app usage). Nonetheless, decades of methodological work have focused on characterizing the various types of missing data and developing statistical approaches for handling missingness [
Missing data that are MCAR and MAR are relatively straightforward to handle (socalled
Unlike MCAR and MAR missingness patterns, MNAR cannot be easily handled in a confident manner. Moreover, MNAR can have multiple causes, making it difficult to develop a single method that can universally address it, even in a single study. As a result, some form of sensitivity analysis is recommended to understand the possible effects of MNAR [
Other methods have been proposed to assess the influence of MNAR data. The combination of high attrition and data that are potentially MNAR is not unique to mHealth research, and several approaches have come from the addiction field [
Hedeker et al [
Despite a longstanding acknowledgment that missing data are common in mHealth research [
As noted above, a pattern of missingness that may be suggestive of MNAR in mHealth research is when missingness is higher in an active condition relative to a passive (eg, waitlist) control group. The context of an RCT is important for making this claim. Random assignment should produce groups balanced on all relevant covariates at baseline, including those that would predict drop out [
A recent metaanalysis of attrition in RCTs testing smartphonebased mental health interventions [
In addition to further clarifying how differential attrition and potential MNAR are handled within the mHealth literature, there is a need to understand the potential implications of MNAR for study outcomes. Selection models and patternmixture models are two promising approaches. Sensitivity analyses such as those recommended by Hedeker et al [
This study has 2 primary aims. The first is to systematically review the analytic methods used to address missingness in a portion of the mHealth literature that has previously shown indications of potential MNAR. We examined 36 RCTs drawn from Linardon and FullerTyszkiewicz’s [
Our second aim is to present methods for evaluating the effects of MNAR that may be relevant to mHealth research. We illustrate the value of sensitivity analyses by applying an MIbased patternmixture model along the lines of Hedeker et al [
To evaluate study design features associated with differential attrition and the methods used to handle missing data, we reanalyzed and systematically reviewed RCTs included in the metaanalysis of attrition in smartphonebased mental health interventions by Linardon and FullerTyszkiewicz [
Log ORs and their variance were then aggregated using a random effects metaanalysis, weighted as typical using inverse variance [
We systematically reviewed several features of the included studies. These included the overall sample size, overall dropout rate, whether potential differential attrition was statistically evaluated (ie, comparing dropout rates for active vs passive conditions), whether differential attrition was detected, the approach used for handling missing data, whether a modern MAR data analytic approach was used (ie, MI or ML), and whether a sensitivity analysis was conducted to evaluate the potential impact of MNAR data. To evaluate whether these study characteristics were linked with differential attrition, we tested them as moderators [
We used data from a recently conducted RCT testing a smartphonebased mental health intervention [
The first approach is a variant of the patternmixture model [
A possible limitation of this approach is that, in the possible absence of useful covariates in predicting missing outcomes, all missing observations will be generated with large SDs, implying a high degree of uncertainty in the missing outcomes. Thus, even when introducing the offset parameter following the patternmixture strategy, the observed variability in the missing observations will still be large. To the extent that we should not confuse lack of knowledge about missing outcomes with actual variability in the missing outcomes, a sensitivity analysis that also considers a fixedvalue replacement for missing observations can be useful. Therefore, we also applied a second sensitivity analysis approach outside the context of MI. Consistent with our strategy, this second approach focuses on estimating residualized change scores, although simple change scores could also be used. Once residualized change scores are imputed for missing cases, nonparametric tests (eg, Wilcoxon signedrank test) can then be conducted using these values to compare changes in the active and passive conditions while avoiding statistical drawbacks associated with conducting parametric tests using single imputed data (eg, artificially deflating SE by treating imputed values as if they were observed values [
For the worstcase scenario, the missing data were assumed to reflect the worst possible observed outcome. Residualized change is operationalized as the observed posttreatment score minus the predicted posttreatment score based on pretreatment. For an outcome such as distress, in which lower values are preferred (ie, lower distress), a larger (ie, more positive) residual indicates a smaller decline in distress (for negative values), or even an increase in distress over time (for positive values). For an outcome in which higher scores were better (eg, wellbeing), one would simply reverse this approach (ie, replace missing values with the minimum observed residual). In our example, the worstcase scenario replaces the missing values with the maximum value of the observed residualized change scores:
We then evaluated possibilities between the complete and worstcase scenarios, with missing values imputed to be 0.20, 0.50, and 0.80 SD from the mean residualized change score. These specific values were chosen to reflect small, medium, and large deviations based on Cohen [
For example, psychological distress in the RCT by Goldberg et al [
Linardon and FullerTyszkiewicz’s [
Attrition rates and study design characteristics.
Study  Tx^{a} ITT^{b}  Tx drop  WL^{c} ITT  WL drop  Diff^{d}  Method^{e}  Multiple imputation  Maximum likelihood 
Bakker et al [ 
234  146  78  25  N/A^{f}  ANOVA^{g}  Yes  No 
Bidargaddi et al [ 
192  106  195  88  Yes, higher in active  Yes  No  
Bostock et al [ 
128  5  110  4  N/A  ANOVA  No  No 
Carissoli et al [ 
20  0  18  0  N/A  ANOVA  N/A  N/A 
Champion et al [ 
38  9  36  3  No  MLM^{h}  Yes  Yes 
Enock et al [ 
206  38  36  0  N/A  MLM  No  Yes 
FaurholtJepsen et al [ 
39  6  39  5  N/A  MLM  No  Unclear^{i} 
Hall et al [ 
76  34  25  13  N/A  MLM  No  Unclear 
Horsch et al [ 
74  29  77  15  N/A  MLM  Yes  Unclear 
Ivanova et al [ 
101  20  51  4  N/A  MLM  No  Yes 
Kahn et al [ 
80  1  80  0  N/A  No  No  
Krafft et al [ 
67  15  31  5  N/A  MLM  No  Yes 
Kristjansdottir et al [ 
70  23  70  33  N/A  No  No  
Kuhn et al [ 
62  11  58  6  No  ANOVA  Yes  No 
Lee and Jung [ 
102  25  104  18  N/A  ANOVA  No  No 
Levin et al [ 
12  0  11  0  N/A  MLM  No  Unclear 
Levin et al [ 
59  13  28  5  No  MLM  No  Unclear 
Lüdtke et al [ 
45  10  45  6  No  ANOVA  Yes  No 
Lukas and Berking [ 
16  2  15  2  N/A  ANOVA  No  No 
Ly et al [ 
36  3  37  2  N/A  MLM  No  Yes 
Ly et al [ 
14  0  14  0  N/A  MLM  No  Yes 
Marx [ 
46  2  50  0  N/A  ANOVA  No  No 
Miner et al [ 
25  2  24  3  N/A  ANOVA  Yes  No 
Moëll et al [ 
29  3  28  1  N/A  ANOVA  No  No 
Oh et al [ 
39  1  20  4  N/A  ANOVA  No  No 
Pham et al [ 
31  14  32  7  N/A  ANOVA  No  No 
Proudfoot et al [ 
242  116  230  32  Yes, higher in active  MLM  Yes  Yes 
Roepke et al [ 
190  152  93  57  Yes, higher in active  MLM  No  Yes 
Rosen et al [ 
57  17  55  7  Yes, higher in active  MLM  No  Yes 
Schlosser et al [ 
22  3  21  0  N/A  ANOVA  No  No 
Stjernsward and Hansson [ 
196  60  202  42  N/A  ANOVA  Yes  No 
Stolz et al [ 
60  18  30  7  No  MLM  Yes  yes 
Tighe et al [ 
31  2  30  0  N/A  ANOVA  No  No 
van Emmerik et al [ 
191  111  186  45  Yes, higher in active  MLM  Yes  Unclear 
Versluis et al [ 
46  9  42  3  Yes, higher in active  MLM  No  Unclear 
Yang et al [ 
45  3  43  4  N/A  ANOVA  No  No 
^{a}Tx: active treatment conditions.
^{b}ITT: intentiontotreat sample size; drop=attrition at posttreatment assessment.
^{c}WL: waitlist (or no treatment control condition).
^{d}Whether differential attrition was tested and, if so, whether a betweengroup difference was detected.
^{e}Primary data analysis method.
^{f}N/A: not applicable (because of lack of missing data or differential attrition test not conducted).
^{g}ANOVA: analysis of variance or related method (eg, analysis of covariance).
^{h}MLM: multilevel model.
^{i}Unclear whether multiple imputation estimator was used.
Most studies used multilevel modeling (17/36, 47%) or a variant of analysis of variance (eg, analysis of covariance, multivariate analysis of variance; 16/36, 44%) as the primary analytic approach, with 8% (3/36) of studies using a
Consistent with Linardon and FullerTyszkiewicz [
Forest plot displaying results of the metaanalysis. Effects sizes are in logodds units, with larger values indicating higher attrition in active conditions relative to passive conditions. The size of points indicates relative weight in the metaanalysis (ie, inverse variance). RE: random effects.
Several potential moderators were assessed using a metaregression analysis. Active conditions were more likely to show higher attrition than passive conditions as the overall sample size increased (B=0.0022, 95% CI 0.00050.0039; note that all metaregression coefficients are in log OR units;
Results of metaregression indicating that larger studies are associated with higher rates of differential attrition (ie, higher attrition in active vs passive conditions). Points are displayed relative to their weight in the metaregression model (ie, inverse variance).
Selection and patternmixture models are 2 valuable modeling strategies for handling MNAR (see
Of the 343 participants randomized, 228 (66.5%) were assigned to 1 of the 2 active conditions, and 115 (33.5%) were assigned to the waitlist control. Consistent with the possibility of MNAR, noncompletion of posttreatment assessments was higher in the active condition (137/228, 60.1%) than in the waitlist condition (48/115, 41.7%; OR 2.10, 95% CI 1.343.33;
Results of patternmixture model sensitivity analysis based on multiple imputation^{a}.
Model  Estimate^{b}  
MAR^{c}  −0.34  .002 
0.20  −0.31  .004 
0.50  −0.28  .01 
0.80  −0.24  .03 
1.10  −0.20  .08 
1.40  −0.17  .16 
^{a}Models are based on varying assumptions regarding the meaning of missingness. Multiply imputed posttest values based on 100 imputations are offset [
^{b}Coefficient for active group status (vs waitlist) predicting posttest distress scores controlling for pretest distress scores pooled across imputed data sets.
^{c}MAR: missing at random (with no offset applied to posttest values).
Pre and posttreatment scores for active and passive conditions with varying constant offset parameters added to multiply imputed values for missing outcomes under conditions of missing not at random (ie, Missing). Values are in zscore units, scaled by distress at baseline (mean 0, SD 1). Panels illustrate trajectories with offsets ranging from 0.2 to 1.4 residual SD. The missing at random panel represents values derived using multiple imputation with no offset applied. MAR: missing at random; WL: waitlist.
We now turn to the results of the fixedvalue replacement sensitivity analysis.
Pre and posttreatment scores for active and passive conditions under varying missing not at random conditions using fixedvalue replacement of missings. Pretreatment values represent zscaled distress at baseline (mean 0, SD 1). Posttreatment values vary across plots. For Comp Raw, posttreatment values are posttreatment distress scaled based on baseline distress. Subsequent plots display residualized change scores ztransformed at posttreatment to aid in visual interpretation of relative, betweengroup prepost change. Comp Resid computed posttreatment as baseline plus residualized change scores for completers only. Worst Resid replaced missing posttreatment Comp Resid values with the lowest improvement in distress. Subsequent figures (0.2, 0.5, 0.8) replaced missing posttreatment Comp Resid values with values 0.2, 0.5, and 0.8 SD worse than the mean residual. Comp: completer; Resid: residualized change; WL: waitlist.
For null hypothesis testing purposes, we used nonparametric tests of mean residualized change scores. Consistent with the multilevel modeling results [
Results of fixedvalue replacement sensitivity analysis.
Group and model  Sample size, n (%)  Mean rank (SD)  SE  



Comp^{b}  91 (39.9)  69.11 (43.01)  4.51  <.001 

Worst^{c}  228 (100)  178.1 (93.05)  6.16  .08 

0.20^{d}  228 (100)  161.89 (83.25)  5.51  .004 

0.50^{d}  228 (100)  165.19 (84.25)  5.58  .05 

0.80^{d}  228 (100)  168.52 (86.01)  5.70  .32 



Comp  67 (58.3)  93.61 (45.9)  5.61  N/A^{e} 

Worst  115 (100)  159.9 (85.62)  7.98  N/A 

0.20^{d}  115 (100)  192.05 (102.27)  9.54  N/A 

0.50^{d}  115 (100)  185.5 (102.26)  9.54  N/A 

0.80^{d}  115 (100)  178.9 (100.34)  9.36  N/A 
^{a}
^{b}Comp: completer sample.
^{c}Worst: worstcase scenario, which assumed missing values are equivalent to the worst outcome (ie, smallest change in distress).
^{d}0.20, 0.50, 0.80: missing values assumed to be 0.20, 0.50, or 0.80 SDs worse than the mean residualized change score.
^{e}N/A: not applicable.
Results of Wilcoxon signedrank test using a fixedvalue replacement sensitivity analysis across varying missing not at random conditions. A lower mean rank indicates larger relative decreases in distress. Comp: completer sample; Worst: worstcase scenario which assumed missing values are equivalent to the worst outcome (ie, the smallest change in distress); 0.2, 0.5, 0.8: missing values assumed to be 0.2, 0.5, or 0.8 SD worse than the mean residualized change score; error bars: 1.96×SE; WL: waitlist. *
This study had two primary aims: to systematically evaluate the handling of a potential source of MNAR data in mHealth research—differential attrition—and advocate for sensitivity analyses as a family of strategies that might be used to assess the impact of MNAR data. At the broadest level, results suggest that MNAR data are likely to be a problem in mHealth research and one that, to date, has not been adequately addressed. As reported by Linardon and FullerTyszkiewicz [
Although patterns of attrition consistent with potential MNAR data were detected in the literature as a whole, only a minority of the included studies tested for differential attrition. However, in keeping with the literaturewide pattern of differential attrition, 6 of the 11 studies comparing attrition rates between active and passive conditions detected higher attrition in the active conditions, whereas the remaining 5 studies failed to detect a difference. Despite the possibility of MNAR data, none of the 36 studies directly assessed the potential influence of MNAR data on the study results. Half of the studies employed other modern missing data methods, such as MI or ML. These approaches have many strengths and are certainly preferred over historical approaches for handling missingness (eg, last observation carried forward and complete case analysis [
Moderator analyses further characterized the correlates of differential attrition. The results indicated that differential attrition was more likely to occur in larger studies. Unfortunately, this association could produce a pernicious source of bias within the literature, as larger studies are presumably the ones most looked to when evaluating evidence of efficacy and are likely to carry more weight in metaanalyses examining efficacy. Interestingly, studies with higher differential attrition were only marginally more likely to assess differential attrition. It may be that differential attrition is simply not recognized or acknowledged as a potential concern worth assessing, even when dropout rates differ. Somewhat counterintuitively, studies with higher differential attrition were not more likely to detect differential attrition when assessed. This lack of association could be because of the limited statistical power for the moderator test itself [
Perhaps the most notable finding of our review is that none of the included studies conducted a sensitivity analysis to evaluate the potential influence of MNAR data on study findings. Although several metaanalyses suggest that smartphonebased mental health interventions produce benefits relative to waitlist control conditions [
The primary aim of this study is to encourage mHealth researchers to consider sensitivity analyses to assess the potential impact of MNAR missingness, particularly when differential attrition is present. Several modern techniques exist for evaluating the potential impact of MNAR missingness, including selection models and patternmixture models that have been discussed. As most of these methods have limitations (eg, they are heavily influenced by untestable assumptions) and may not be within the current analytic repertoire of many mHealth clinical trialists, we presented an MIbased patternmixture model sensitivity analysis approach adapted from smoking cessation research [
It is worth noting that the 2 sensitivity analysis approaches illustrated in this study provided somewhat discrepant conclusions regarding the degree to which data from Goldberg et al [
This study had several important limitations. The first and broadest limitation is that we cannot definitively conclude that the observed differential attrition necessarily results in MNAR data. It is possible that remaining in the study was because of factors unrelated to changes in study endpoints (ie, distress). Likewise, drop outs in the active group could have been because of participants not using the smartphone app and losing interest in the study because their psychological symptoms had already improved (as can be the case in psychotherapy [
Several future directions follow naturally from this study. MNAR sensitivity analyses could be integrated into future mHealth RCTs. For instances with longitudinal data, more complex patternmixture models may be especially attractive [
Short of incorporating sensitivity analyses into mHealth RCTs, researchers could, at a minimum, test for differential attrition, especially when comparing active and passive conditions. Acknowledging the potential influence of MNAR, when differential attrition is present, can allow readers to more accurately evaluate study findings in light of this limitation. A way to assess the potential impact of MNAR because of differential attrition would be through reanalysis of published mHealth RCTs, especially large trials that were seen to have higher rates of differential attrition. Reanalyses of this kind could help determine the degree to which findings are sensitive to varying MNAR assumptions, and by extension, the degree to which conclusions drawn from the broader literature may be similarly influenced. Another future direction is intentionally adopting methods that decrease attrition generally [
Attrition is a persistent thorn in the side of mHealth clinical trialists [
Evidence of differential attrition supports the possibility that MNAR may be a common problem in mHealth RCTs with passive controls and one that is largely unacknowledged to date. Sensitivity analyses offer an approach for establishing the impact of differential attrition on the study results.
R code for conducting sensitivity analysis.
Modelbased approaches for handling missing not at random data.
missing at random
missing completely at random
mobile health
multiple imputation
maximum likelihood
missing not at random
odds ratio
randomized controlled trial
The authors include data from an RCT registered at ClinicalTrials.gov (NCT04139005) and through the Open Science Framework [
RJD is the founder, president, and serves on the board of directors for the nonprofit organization Healthy Minds Innovations, Inc.