Published on 20.11.19 in Vol 21, No 11 (2019): November
Assessing the Psychometric Properties of the Digital Behavior Change Intervention Engagement Scale in Users of an App for Reducing Alcohol Consumption: Evaluation Study
Background: The level and type of engagement with digital behavior change interventions (DBCIs) are likely to influence their effectiveness, but validated self-report measures of engagement are lacking. The DBCI Engagement Scale was designed to assess behavioral (ie, amount, depth of use) and experiential (ie, attention, interest, enjoyment) dimensions of engagement.
Objective: We aimed to assess the psychometric properties of the DBCI Engagement Scale in users of a smartphone app for reducing alcohol consumption.
Methods: Participants (N=147) were UK-based, adult, excessive drinkers recruited via an online research platform. Participants downloaded the Drink Less app and completed the scale immediately after their first login in exchange for a financial reward. Criterion variables included the objectively recorded amount of use, depth of use, and subsequent login. Five types of validity (ie, construct, criterion, predictive, incremental, divergent) were examined in exploratory factor, correlational, and regression analyses. The Cronbach alpha was calculated to assess the scale’s internal reliability. Covariates included motivation to reduce alcohol consumption.
Results: Responses on the DBCI Engagement Scale could be characterized in terms of two largely independent subscales related to experience and behavior. The experiential and behavioral subscales showed high (α=.78) and moderate (α=.45) internal reliability, respectively. Total scale scores predicted future behavioral engagement (ie, subsequent login) with and without adjusting for users’ motivation to reduce alcohol consumption (adjusted odds ratio [ORadj]=1.14; 95% CI 1.03-1.27; P=.01), which was driven by the experiential (ORadj=1.19; 95% CI 1.05-1.34; P=.006) but not the behavioral subscale.
Conclusions: The DBCI Engagement Scale assesses behavioral and experiential aspects of engagement. The behavioral subscale may not be a valid indicator of behavioral engagement. The experiential subscale can predict subsequent behavioral engagement with an app for reducing alcohol consumption. Further refinements and validation of the scale in larger samples and across different DBCIs are needed.
J Med Internet Res 2019;21(11):e16197
Some level of engagement with digital behavior change interventions (DBCIs) is necessary for the effectiveness of such interventions . However, observed levels of engagement with DBCIs are often considered too limited to support behavior change [ ]. For example, a systematic review of Web-based health interventions found that approximately 50% of participants engaged with the interventions in the desired manner, with estimates varying between 10% and 90% across trials [ ]. Studies conducted across different settings and target behaviors report a positive association of DBCI engagement and intervention effectiveness [ , ], suggesting that these variables may be linked via a dose-response function [ , ]. However, it is also plausible that individuals who are more successful in achieving change in the behavior targeted by the DBCI engage with DBCIs more [ ] or that a limited amount of engagement is sufficient for bringing about meaningful change in some users (ie, “effective engagement”) [ ]. Attempts have been made to characterize the function linking engagement with intervention effectiveness [ , - ], but progress is hindered due to the use of different definitions and measures of engagement across studies.
The question of what it means for someone to be engaged with a DBCI has been of interest to psychologists and computer scientists alike. Broadly, psychologists have defined engagement as the extent of technology use, perceived as a proxy for participant exposure to a DBCI’s “active ingredients” or component behavior change techniques [, ]. On the other hand, computer scientists have defined engagement as the subjective experience of “flow” or “immersion” that occurs during the human-computer interaction, characterized by focused attention, intrinsic interest, balance between challenge and skill, losing track of time and self-consciousness, and transportation to a “different place” [ , ]. After having conducted a systematic, integrative literature review of the psychology and computer science literatures [ ] in addition to in-depth interviews with potential DBCI users, our interdisciplinary research team proposed the following working definition of engagement: “[Engagement with a DBCI is] a state-like construct which occurs each time a user interacts with a DBCI, with two behavioral (ie, amount and depth of use) and three experiential (ie, attention, interest and enjoyment) dimensions [ ].”
We hence theorized that two behavioral (amount and depth of use) and three experiential (attention, interest, and enjoyment) dimensions are necessary and sufficient conditions for someone to be engaged with a DBCI. Although similar, engagement with DBCIs is thought to be conceptually distinct from both “flow” and pure technology usage. Although several measures of flow, immersion, and technology usage are available for use (for overviews, see [, , ]), an instrument that quantifies the intensity of behavioral and experiential engagement is lacking. For a quantitative scale of engagement to be useful for researchers, practitioners, and developers, it should be able to predict key variables of interest such as future engagement, knowledge acquisition, or intervention effectiveness. In addition, although a number of usage metrics derived from log-data are typically used to capture the intensity of behavioral engagement [ - ], a validated measure of engagement, which captures both the experiential and behavioral dimensions of engagement and could be easily administered without the need to access and process the DBCI’s raw data, would be useful. The DBCI Engagement Scale was developed to fill this gap [ ].
As part of the scale development process (described in detail in ), a pool of initial scale items was developed by the interdisciplinary research team in addition to two “best bets” for a short measure of engagement. Lay and expert respondents were then asked to classify the initial scale items into one of six categories (ie, amount of use, depth of use, interest, attention, enjoyment, plus an unclassified category) to examine the scale’s content validity. The first psychometric evaluation of the 10-item DBCI Engagement Scale was conducted in a sample of adult excessive drinkers who had voluntarily downloaded a freely available, evidence-informed app—Drink Less—for reducing their alcohol consumption [ ]. Results indicated that the behavioral and experiential indicators of engagement may resolve to a single dimension. However, fewer than 5% of eligible users completed the scale during the study, and a sensitivity analysis indicated that the analytic sample was biased toward highly engaged users.
Studying engagement in real-world settings is notoriously difficult, as highly engaged users are more likely to respond to research surveys , potentially biasing results. Moreover, evidence suggests that motivation to change the target behavior is consistently associated with the frequency of behavioral engagement, such as the total number of logins [ , ]. Although motivation to change is a key predictor of engagement, it is neither a necessary nor a sufficient condition for someone to be engaged with a DBCI. For example, a user with low motivation to reduce their alcohol consumption might be intrigued by the design of a specific app, engage with its content, and subsequently become motivated to drink less. Therefore, to better study the dimensional structure of engagement, we considered it important to adjust for motivation to change in our analyses, thus separating the state of engagement from confounding motivations. This study aimed to evaluate the DBCI Engagement Scale in a sample of users recruited via an online research platform in order to address the following research questions:
- What is the factor structure of the DBCI Engagement Scale? (construct validity)
- Is the DBCI Engagement Scale internally reliable? (internal reliability)
- Are total scale scores positively associated with objectively recorded amount of use and depth of use? (criterion validity)
- Do total scale scores predict future behavioral engagement (ie, subsequent login), with and without adjustment for motivation to reduce alcohol consumption? (predictive validity)
- Do two best bets for a short measure of engagement predict future behavioral engagement, with and without adjustment for motivation to reduce alcohol consumption? (predictive validity)
- Does a model including the objectively recorded behavioral and the self-reported experiential indicators of engagement account for more variance in future behavioral engagement (ie, subsequent login) compared with a model including only the objectively recorded behavioral indicators of engagement? (incremental validity)
- Are total scale scores significantly associated with scores on the Flow State Scale? (divergent validity)
The preregistered study protocol can be found in the Open Science Framework . Ethical approval was granted by University College London’s Computer Science Departmental Research Ethics Chair (Project ID: UCLIC/1617/004/Staff Blandford HFDH).
Participants were eligible to take part in the study if they were aged ≥18 years; reported an Alcohol Use Disorders Identification Test (AUDIT) score ≥8, indicating excessive alcohol consumption ; were residing in the United Kingdom; owned an iPhone capable of running iOS 8.0 software (ie, an iPhone 4S or later models); and were willing to download and explore an app for reducing alcohol consumption.
Participants were recruited via the online research platform Prolific . Individuals who take part in research via online platforms are primarily motivated by the financial incentives and are not necessarily interested in health behavior change [ ]. Therefore, we expected that Prolific would enable us to recruit a sample of users with different levels of motivation to change. We did not, however, expect to recruit a sample that is representative of the general population of excessive drinkers in the United Kingdom.
No formal sample size calculation was performed. Based on the psychometric literature, a 25:1 participant-to-item ratio (ie, a total of 250 participants) was considered desirable .
To determine eligibility and describe the sample, data were collected on age; sex (female or male); type of work (manual, nonmanual, or other); patterns of alcohol consumption, measured by the AUDIT ; motivation to reduce alcohol consumption, measured by the Motivation To Stop Scale [ - ]; country of residence (United Kingdom or other); iPhone ownership (yes or no); and willingness to download and explore an alcohol reduction app (yes or no).
For eligible participants who downloaded and explored the Drink Less app, data were collected on location during first use of the app (home, work, vehicle, public transport, restaurant/pub/café, other’s home, other, or can’t remember) and the 10-item DBCI Engagement Scale , which captures momentary behavioral (ie, amount, depth of use) and experiential (ie, attention, interest, enjoyment) engagement with DBCIs ( ). A detailed account of how the scale items were developed and tested in a group of experts and nonexperts can be found in a previous study [ ].
Data were also collected on the below variables, which were used to test the scale’s criterion, predictive, incremental, and divergent validity.
Construct, Criterion, and Incremental Validity
A record of the number of app screens viewed was kept during participants’ first login session to derive the objectively recorded amount of use and depth of use, which were used to test the scale’s construct, criterion, and incremental validity. The screen view records were stored in an online database (NodeChef) and extracted using the free python library pandas. The variable amount of use was derived by calculating the time spent (in seconds) during participants’ first login session. The variable depth of use was derived by calculating the number of app components visited during participants’ first login session, indexed as a proportion (0-100) of the number of available components within the Drink Less app (ie, Goal Setting, Self-monitoring/Feedback, Action Planning, Normative Feedback, Cognitive Bias Re-Training, Identity Change, Other ).
A record of the number of app screens viewed was also kept over the next 14 days to derive the variable subsequent login, which was used to test the scale’s predictive validity. A subsequent login (yes vs no) was defined as a new screen view following at least 30 minutes of inactivity . As health apps are likely to be abandoned after users’ first login [ , ], the authors theorized that a useful measure of engagement should be able to distinguish between users who are likely to return to an app.
Two items that represented the authors’ best bets for a short measure of engagement (ie, “How much did you like the app?” and “How engaging was the app?”) were developed by the study team and used to test whether a short measure of engagement had superior predictive validity compared with the scale in its entirety. These items were not explicitly drawn from published self-report scales.
Two items from the Flow State Scale  were used to test the scale’s divergent validity. We selected two items that were previously found to load most strongly onto the general flow factor (ie, “When using Drink Less, the way time passed seemed to be different from normal,” “When using Drink Less, I was not worried about what others may have been thinking of me”). Although there is some overlap in the experiential indicators of the states of engagement and flow (ie, focused attention, interest), the study team theorized that users do not necessarily experience loss of time and consciousness or balance between challenge and skill when engaging with a DBCI. Assessing whether users can be engaged without necessarily being in a state of flow was therefore considered a useful test of the scale’s divergent validity. The Flow State Scale has previously been applied in the context of digital gaming [ ].
Interested participants were identified via the recruitment platform, Prolific, and received a compensation of £0.50 for completing the screening questionnaire, hosted by Qualtrics survey software (Provo, Utah). Eligible participants were invited via Prolific’s internal email system and asked to download the Drink Less app from the Apple App Store. Participants were instructed to explore the Drink Less app in the way that they would explore any new app and were told that the researchers would monitor their app usage to assess what content they were interested in. For technical reasons, participants were told that they had to select the option Interested in drinking less alcohol when asked about why they were using the Drink Less app and to enable the push notifications. When clicking on the phone’s home button after having finished exploring the app, participants received a push notification with a link to the study survey. Participants were subsequently asked to enter their Prolific identification number, which enabled the researchers to match participants’ survey responses to their app screen views. Participants who initiated but did not complete the study survey (as indicated by their response status on Prolific’s platform, which was either labelled “Timed out” or “Returned submission”) were sent one reminder message. On completing the task, participants were paid £1.25.
All analyses were conducted in SPSS version 20.0 (IBM Corporation, Armonk, New York). The assumptions for parametric tests were assessed (ie, normality of the distribution of residuals) and when violated, normalization was applied (ie, z-score normalization of positively skewed data). Descriptive statistics (eg, mean, range, variance) were calculated for each scale item and the criterion variables of interest to determine suitability for factor analysis.
It was hypothesized that a five-factor solution (ie, amount of use, depth of use, attention, interest, enjoyment) would provide the best fit of the observed data . A series of exploratory factor analyses (EFAs) using principal axis factoring estimation and oblique rotation was conducted. The inspection of Cattell’s scree plots and the Kaiser criterion (ie, factors with eigenvalues >1) was used to determine the number of factors to retain [ ]. First, we tested the fit of a solution including the self-reported items. This was compared with a solution including a combination of the self-reported indicators of experiential engagement and the objectively recorded indicators of behavioral engagement (ie, objective amount of use and depth of use).
Internal consistency reliability was assessed by calculating the Cronbach alpha. A large coefficient (ie, =.70 or above) was interpreted as evidence of strong item covariance .
Criterion validity was assessed by calculating the Pearson correlation coefficient for the relationship between participants’ automatically recorded app screen views from their first login (ie, objective amount of use and depth of use) with their self-reported amount of use and depth of use and their total scale scores.
The variable subsequent login was regressed onto participants’ total scale scores, with and without adjustment for motivation to reduce alcohol consumption.
The variable subsequent login was also regressed onto each of the two best bets for a short measure of engagement (ie, “How engaging was the app?” and “How much did you like the app?”), with and without adjustment for motivation to reduce alcohol consumption.
Incremental validity was assessed in two steps. First, we assessed the variance accounted for in the variable subsequent login by the objectively recorded indicators of behavioral engagement. This was compared with the variance accounted for in the variable subsequent login after adding the self-reported indicators of experiential engagement to the objectively recorded indicators of behavioral engagement.
Divergent validity was assessed by calculating the Pearson correlation coefficient for the relationship between each of the two indicators of the state of flow from the Flow State Scale  and the overall measure of engagement.
During the study period (31 days; July 23, 2018 to August 22, 2018), 401 participants completed the online screening survey, of which 266 were eligible to take part. Of these, 147 (55%) participants downloaded the Drink Less app and completed the task (). Due to funding restrictions, we were unable to extend the recruitment beyond this time point. The desired target sample size of 250 participants was hence not achieved. Participants’ demographic and drinking characteristics are reported in . We did not detect any significant differences between eligible participants who did and did not complete the task on the demographic characteristics assessed.
|Demographic characteristics||Completed scale (n=147)||Eligible but did not complete scale (n=119)||P valuea|
|Female gender, n (%)||97 (66)||71 (60)||.29|
|Type of work, n (%)||.57|
|Manual, n (%)||19 (13)||16 (13)|
|Nonmanual, n (%)||89 (61)||78 (66)|
|Other, n (%)||39 (27)||25 (21)|
|Age (years), mean (SD)||34.4 (10.4)||36.6 (11.8)||.11|
|Motivation To Stop Scale, n (%)||.08|
|I don’t want to cut down on drinking alcohol||14 (10)||26 (22)|
|I think I should cut down on drinking alcohol but I don’t really want to||43 (29)||25 (21)|
|I want to cut down on drinking alcohol but I haven’t thought about when||19 (13)||17 (14)|
|I really want to cut down on drinking alcohol but I don’t know when I will||17 (12)||11 (9)|
|I want to cut down on drinking and hope to soon||23 (16)||17 (14)|
|I really want to cut down on drinking alcohol and intend to in the next 3 months||11 (7)||4 (3)|
|I really want to cut down on drinking alcohol and intend to in the next month||20 (14)||19 (16)|
|Alcohol Use Disorders Identification Test, mean (SD)||15.4 (5.1)||14.2 (5.7)||.07|
aDifferences between groups were assessed using chi-square tests or t tests, as appropriate.
Descriptive statistics for the scale items are reported in. The majority of participants completed the scale at home (118/147, 80.3%) or work (19/147, 12.9%). To account for the observed skewness, z-score normalization was applied to the 10-scale items and the two items used for testing the scale’s criterion validity. Inter-item correlations of the normalized scale items are reported in .
The Keiser-Meier Olkin Test of Sampling Adequacy (0.70) and the Bartlett Test of Sphericity (P<.001) indicated that data were suited for factor analysis. Three EFA solutions were tested to arrive at a best-fitting solution.
|DBCIa Engagement Scale Items|
|1. “How strongly did you experience interest?”||2-7||5.30 (1.09)||1.18||–0.30||0.06|
|2. “How strongly did you experience intrigue?”||1-7||5.39 (1.27)||1.61||–0.85||0.50|
|3. “How strongly did you experience focus?”||2-7||5.31 (1.18)||1.40||–0.56||0.14|
|4. “How strongly did you experience inattention?”b||1-7||5.61 (1.33)||1.76||–1.24||1.47|
|5. “How strongly did you experience distraction?”b||1-7||5.47 (1.45)||2.10||–1.12||0.86|
|6. “How strongly did you experience enjoyment?”||1-7||4.46 (1.44)||2.07||–0.10||–0.48|
|7. “How strongly did you experience pleasure?”||1-7||3.56 (1.64)||2.67||0.36||–0.70|
|8. “How strongly did you experience annoyance?”b||1-7||5.59 (1.39)||1.93||–1.09||1.08|
|9. “Which of the app’s components did you visit?”||14.29-100.00||58.70 (22.00)||484.01||–0.12||–0.67|
|10. “How much time do you roughly think that you spent on the app?” (seconds)||120-1200||520.82 (237.21)||56,267.82||0.93||0.96|
|Variables used to test the scale’s construct, criterion, and incremental validity|
|11. Objectively recorded depth of use||28.57-100.00||66.66 (20.50)||420.28||–0.23||–0.85|
|12. Objectively recorded amount of use (seconds)||95.00-3,571.00||409.45 (360.71)||130,116.72||5.13||40.34|
|Items used to test the scale’s divergent validity|
|13 “When using Drink Less, the way time passed seemed different from normal.”||1-5||2.76 (0.79)||0.62||0.11||0.10|
|14. “When using Drink Less, I was not worried about what others may have been thinking about me.”||1-5||3.34 (1.16)||1.35||–0.24||–1.11|
|Variables/items used to test the scale’s predictive validity|
|15. “How much did you like the app?”||1-7||5.14 (1.29)||1.66||–0.80||0.82|
|16. “How engaging was the app?”||1-7||5.20 (1.17)||1.37||–0.65||0.66|
|17. Subsequent login (yes vs no), n (%)||67 (46)||N/Ac||N/A||N/A||N/A|
aDBCI: digital behavior change intervention.
bValues were reverse scored prior to the calculation of descriptive statistics.
|DBCI Engagement Scale items||1a||2b||3c||4d,e||5e,f||6g||7h||8e,i||9j||10k||11l,m||12m,n|
|9. Which of app’s components||0.18||0.00||0.06||0.13||–0.03||0.19||0.19||.13||1|
|10. How much time spent||0.10||0.10||–0.03||0.08||0.11||0.15||0.33||0.09||0.29||1|
|11. Objective depth of usem||0.13||0.11||0.15||0.18||0.01||0.11||–0.01||0.24||0.51||0.16||1|
|12. Objective amount of usem||0.31||0.18||0.28||0.16||0.06||0.25||0.00||0.19||0.10||0.10||0.52||1|
eValues were reverse scored prior to analysis.
jWhich of the app’s components.
kHow much time spent.
lObjective depth of use.
mVariables used to test the scale’s construct, criterion, and incremental validity.
nObjective amount of use.
An EFA with oblique rotation was conducted. The eigenvalues indicated that a three-factor solution, accounting for 61.2% of the variance, was most appropriate (). The loadings indicated that the second factor comprised two of the negatively worded indicators (ie, items 4 and 5). The third factor comprised the two behavioral indicators (ie, items 9 and 10) and one of the experiential indicators (ie, item 7), which made little theoretical sense [ ]. The loading of item 8 (also a negatively worded item) onto factor 1 was modest. Therefore, the negatively worded items (ie, items 4, 5, and 8) and item 7 were discarded prior to conducting a second EFA.
A subsequent EFA with oblique rotation indicated that a two-factor solution accounted for 62.4% of the variance (). The experiential indicators loaded clearly onto factor 1, and the behavioral indicators loaded clearly onto factor 2, with no cross-loadings (ie, items that load at 0.32 or higher on two or more factors) [ ]. The two latent factors were labelled Experiential Engagement and Behavioral Engagement, respectively.
An EFA with oblique rotation using a combination of the self-reported experiential indicators (ie, items 1, 2, 3, and 6) and the automatically recorded behavioral indicators (ie, items 11 and 12) suggested a two-factor solution, which accounted for 65.7% of the variance. The experiential indicators loaded clearly onto factor 1, and the behavioral indicators loaded clearly onto factor 2 ().
Solution 2 was selected for use in the subsequent reliability and validity analyses, as it contained only the self-reported items and provided a similarly good fit of the data as Solution 3. A total scale score was calculated for each participant, with equal weight given to each of the retained items (ie, items 1, 2, 3, 6, 9, and 10).
|Scale Items||Solution 1a||Solution 2b||Solution 3c|
|Factor 1||Factor 2||Factor 3||Factor 1||Factor 2||Factor 1||Factor 2|
|9. Which of app’s components||0.16||0.01||0.43d||0.15||0.55||N/A||N/A|
|10. How much time spent||0.10||0.03||0.64d||0.09||0.53||N/A||N/A|
|11. Objective depth of use||N/A||N/A||N/A||N/A||N/A||0.37||0.77d|
|12. Objective amount of use||N/A||N/A||N/A||N/A||N/A||0.18||0.68d|
aExploratory factor analysis with oblique rotation, including items 1-10.
bExploratory factor analysis with oblique rotation, including items 1, 2, 3, 6, 9, and 10.
cExploratory factor analysis with oblique rotation, including items 1, 2, 3, 6, 11, and 12.
dValues with factor loadings ≥0.40.
eValues were reverse scored prior to analysis.
The internal consistency of the overall measure was 0.67, indicating moderate internal reliability . The Experiential Engagement subscale had an internal consistency of 0.78, while the Behavioral Engagement subscale had an internal consistency of 0.45. Both subscales were significantly correlated with the measure overall (r145=0.90, P<.001 and r145=0.56, P<.001, respectively). However, the subscales were not significantly correlated with each other (r145=0.15, P=.07).
Total scale scores were significantly correlated with objectively recorded depth of use (r145=0.32, P<.001) and objectively recorded amount of use (r145=0.33, P<.001). Self-reported depth of use was significantly correlated with objectively recorded depth of use (r145=0.51, P<.001). Self-reported amount of use was not significantly correlated with objectively recorded amount of use (r145=0.10, P=.23).
Results from the predictive validity analyses are presented in. In the unadjusted analysis, total scale scores were significantly associated with future behavioral engagement, ie, the variable subsequent login (odds ratio [OR]=1.15, 95% CI 1.05-1.27, P=.01). The association remained significant in the model adjusting for motivation to reduce alcohol consumption (adjusted OR [ORadj]=1.14, 95% CI 1.03-1.27, P=.01).
As the two subscales (ie, Behavioral Engagement and Experiential Engagement) were not significantly correlated with each other, an unplanned analysis was conducted to assess the independent association of each subscale with future behavioral engagement. In unadjusted and adjusted analyses, Experiential Engagement was significantly associated with future behavioral engagement (ORadj=1.19, 95% CI 1.05-1.34, P=.006). In unadjusted and adjusted analyses, Behavioral Engagement was not significantly associated with future behavioral engagement (ORadj=1.31, 95% CI 0.38-4.59, P=.67).
In unadjusted and adjusted analyses, asking users about how engaging they thought the app was did not significantly predict future behavioral engagement (ORadj=1.34, 95% CI 0.98-1.84, P=.07). In unadjusted and adjusted analyses, asking users about how much they liked the app significantly predicted future behavioral engagement (ORadj=1.38, 95% CI 1.03-1.84, P=.03).
|Predictor variables||Odds ratio (95% CI)||P value||Adjusted odds ratioa (95% CI)||P value|
|Total DBCIb Engagement Scale score||1.15 (1.05-1.27)||.005||1.14 (1.03-1.27)||.009|
|Subscale 1 - Experiential Engagement||1.19 (1.06-1.34)||.004||1.19 (1.05-1.34)||.006|
|Subscale 2 - Behavioral Engagement||1.11 (0.90-1.36)||.34||1.08 (0.87-1.35)||.48|
|“How engaging was the app?”||1.28 (0.96-1.71)||.097||1.34 (0.98-1.84)||.07|
|“How much did you like the app?”||1.39 (1.05-1.83)||.02||1.38 (1.03-1.84)||.03|
aOdds ratios adjusted for motivation to reduce alcohol consumption.
bDBCI: digital behavior change intervention.
Results from the incremental validity analyses are reported in. The automatically recorded behavioral indicators of engagement (ie, items 11 and 12; Model 1) accounted for 15.9% of variance in the variable subsequent login. The automatically recorded behavioral indicators in combination with the self-reported experiential indicators of engagement (ie, items 1, 2, 3, and 6; Model 2) accounted for 21.1% of variance in the variable subsequent login.
|Models||Odds ratio (95% CI)||P value||Variance accounted for (%)|
|Objectively recorded amount of use||3.46 (1.58-7.57)||.002|
|Objectively recorded depth of use||0.91 (0.58-1.42)||.67|
|Objectively recorded amount of use||2.86 (1.25-6.55)||.013|
|Objectively recorded depth of use||0.95 (0.60-1.50)||.82|
Total scale scores were significantly correlated with the first (“When using Drink Less, the way time passed seemed different from normal”) but not the second (“When using Drink Less, I was not worried about what others may have been thinking about me”) indicator of flow (r145=0.25, P<.01 and r145=–0.01, P=.95, respectively). The two items tapping flow were not significantly correlated with one another in this sample (r145=–0.06, P=.47).
The DBCI Engagement Scale was found to be underpinned by two, largely independent factors, which were labelled Experiential Engagement and Behavioral Engagement. The scale showed moderate internal reliability, but low divergent and criterion validity. Importantly, the behavioral subscale may not be a valid indicator of behavioral engagement. Total scale scores were weakly associated with future behavioral engagement (ie, the variable subsequent login), as were the experiential subscale and one of the best bets for a short measure of engagement (ie, asking participants about how much they liked the app). The behavioral subscale was not independently associated with future behavioral engagement. In addition, a model including the self-reported experiential and objectively recorded behavioral indicators of engagement (as compared with a model including only the objectively recorded behavioral indicators) accounted for a larger proportion of variance in future behavioral engagement. These findings are at odds with those from the first evaluation of the DBCI Engagement Scale, in which the scale was found to be underpinned by a single factor . However, these differences may at least partly be accounted for by the small sample size in this study.
The finding that the Experiential Engagement and Behavioral Engagement subscales were not significantly correlated with each other in this study lends support to the argument that users can spend time on a DBCI without necessarily being interested in or paying attention to its content, and vice versa . However, this finding also gives rise to the question of whether experiential and behavioral engagement are part of the same higher-order construct.
The finding that participants’ total scale scores were weakly associated with future behavioral engagement even when adjusting for motivation to reduce alcohol consumption serves as initial evidence that the state of engagement with a DBCI is conceptually distinct from motivation to change the target behavior.
Incremental and Predictive Validity
The results from the incremental validity analyses suggest that behavioral and experiential indicators in tandem have superior predictive power compared with the behavioral indicators alone. However, the finding that the experiential, but not the behavioral, subscale was independently associated with future behavioral engagement can be interpreted to suggest that the experiential indicators (particularly users’ interest) were driving the association between initial and future engagement. A potential explanation for these findings is that more intensive engagement during the first login session might have made users’ memory of the app more salient, which might have made them more likely to remember to return to the app. As one of the short measures of engagement (ie, the item asking about how much users liked the app) was also found to predict future engagement, it is possible that not only salience of the app, but a salient memory of liking the app, is important for future engagement. It is unclear why the first, but not the second, short measure of engagement had significant predictive power; the word liking might be easier to interpret than the word engaging. The potential mechanisms underlying the relationship between initial experiential and behavioral engagement, and future behavioral engagement (ie, the variable subsequent login) should be explored further using experience sampling techniques in the first few hours following initial app engagement; this involves repeated measurements of psychological processes in real time, in users’ natural environments .
These results also beg the question as to whether future behavioral engagement is the most appropriate criterion variable to test an engagement scale against. For example, knowledge retention or skill acquisition may be more theoretically sound, as suggested by the Elaboration Likelihood Model of Persuasion (ELMP) . The ELMP argues that deep information processing occurs when an individual pays attention to (or engages with) a health message, which leads to increased knowledge retention. It is plausible that initial behavioral and experiential engagement have superior predictive power compared with behavioral engagement when used to predict knowledge retention. In addition, it would be useful to assess whether the new measure of moment-to-moment (or state-like) engagement is able to predict intervention effectiveness at a later time point.
The finding that the self-reported and objectively recorded indicators of amount of use were not significantly correlated in this sample suggests that the DBCI Engagement Scale may not be a valid indicator of behavioral engagement. However, although the amount of use (ie, time spent in minutes or seconds) is typically used as a gold standard or ground truth of behavioral engagement, our results showed that objectively recorded amount of use was significantly correlated with many of the experiential indicators (eg, interest, intrigue). Although the exploratory factor analyses did not indicate that amount of use loads onto the same factor as the experiential indicators of engagement, the observed pattern of correlations leads us to question whether time spent on a DBCI is deserving of its ground truth status. There is, hence, a need for future research to investigate the source of the discrepancy between self-reported and objectively recorded indicators of amount of use.
In line with the first study evaluating the scale, this study did not provide evidence that the DBCI Engagement Scale diverges from the Flow State Scale. There is conceptual overlap between engagement with DBCIs and the dimension of flow that is labelled losing track of time. It should be noted that the proposed definition of engagement was, in part, developed based on the concept of flow . It may hence be more fruitful to assess the scale’s divergent validity using a more conceptually distinct measure in the future. The lack of evidence that the DBCI Engagement Scale diverges from the Flow State Scale may also serve as a plausible explanation for why participants’ self-reported amount of use was not significantly correlated with their objectively recorded amount of use; they may have lost track of time when engaging with the Drink Less app. This finding suggests that self-reported and objectively recorded indicators of time spent on a DBCI may tap different constructs; future research is required to examine which of these is more strongly related to key outcomes of interest.
This study was limited because it did not achieve the desired sample size of 250 participants. As Prolific is a novel platform with a small proportion of individuals meeting the study eligibility criteria (ie, drinking alcohol excessively, willing to download an alcohol reduction app, owning an iPhone), the extant participant pool was exhausted after screening just over 400 participants. Although the participant-to-item ratio is considered key in determining the minimum necessary sample size for conducting factor analyses, findings from simulation studies indicate that other factors, including the number of items per factor and the level of communality between items, also influence sample size requirements . Given the limited participant-to-item ratio and the small number of items per factor in this study, the two-factor solution should be interpreted with caution and merits replication in a larger sample in future research. A second limitation is that market research indicates that iOS users are, on average, more affluent than Android users [ ]. As the Drink Less app is currently available for iOS users only, our findings may not be generalizable to Android users.
Studies conducted via Prolific that involve an initial screening study followed by inviting eligible participants to complete the actual study tend to have attrition rates of approximately 20%-25%, and not 45% . It is therefore likely that there were systematic differences between eligible participants who completed the task and those who did not. For example, the small financial reward may not have been perceived as worth the effort of downloading an app. Indeed, a study assessing the demographic and psychological characteristics of participants who regularly complete research tasks via Amazon’s Mechanical Turk online platform (which is similar to Prolific) found that the majority of surveyed participants reported that earning money was a key motivator for taking part [ ]. It should also be noted that the financial incentive may have interfered with participants’ naturalistic engagement, thus limiting the generalizability of the findings. Previous research has found that money can be an important motivator in DBCI research and increase response rates in longitudinal studies [ ].
We did not want to overburden users; hence, we did not assess key trait-like variables that may have influenced users’ scale scores. For example, it would have been useful to attempt to partial out the variance accounted for by users’ personality traits, such as those specified in the Big Five model of personality , to ensure that the DBCI Engagement Scale is detecting something beyond high conscientiousness or low neuroticism.
The adjustment for participants’ motivation to reduce their alcohol consumption should have increased the item covariance on the DBCI Engagement Scale and is hence considered a study strength. It should, however, be noted that participants’ motivation may have interacted with their engagement levels. Hence, despite the adjustment for participants’ motivation to change, the scale scores may not fully represent participants’ “true” engagement scores.
Finally, the decision to use Google’s cutoff (ie, 30 minutes of inactivity) to identify whether users had made a subsequent login is, to our best knowledge, not grounded in evidence about session length. Future research should explore whether this constitutes a useful heuristic for identifying new DBCI sessions using both quantitative and qualitative methods.
Avenues for Future Research
Due to the observed nonnormal distributions of the scale items that jointly form the DBCI Engagement Scale, a decision was made to use z-score normalization. Consequently, total scores on the DBCI Engagement Scale are only meaningful in relation to the average intensity of experiential and behavioral engagement that a particular DBCI generates. This may facilitate attempts to develop cutoffs for “high” and “low” engagers across DBCIs, irrespective of their specific parameters (eg, the number and length of intervention components). For example, users with scores that fall within a particular range of SDs above or below the mean might usefully be classified as “high” or “low” engagers, and these patterns may replicate across DBCIs. The question of whether the mean and spread of engagement scores replicate across DBCIs merits exploration by evaluating the DBCI Engagement Scale across different kinds of DBCIs (eg, websites or apps for smoking cessation or physical activity).
The finding that initial experiential engagement (or liking of the app) was independently associated with future behavioral engagement suggests that intervention developers should think carefully about how to make their DBCIs appealing on first use. The DBCI Engagement Scale may be useful during the iterative design process, comparing users’ experiences of differently designed graphical user interfaces.
The DBCI Engagement Scale assesses behavioral and experiential aspects of engagement. The behavioral subscale may not be a valid indicator of behavioral engagement. The experiential subscale can predict subsequent behavioral engagement with an app for reducing alcohol consumption. Further refinements and validation of the scale in larger samples and across different DBCIs are needed.
OP is funded by a PhD studentship from Bupa under its partnership with UCL. CG receives salary support from Cancer Research UK (C1417/A22962). SM is funded by Cancer Research UK and the NIHR School for Public Health Research. We gratefully acknowledge all the funding received.
OP, JL, CG, AB, RW, and SM designed the study. OP collected the data, conducted the statistical analyses, and wrote the first draft of the manuscript. All authors have contributed to the final version of the manuscript and agree with its submission to JMIR.
Conflicts of Interest
OP, CG, AB, and SM have no conflicts of interest to declare. RW undertakes research and consultancy and receives fees for speaking from companies that develop and manufacture smoking cessation medications. JL is an employee at Prolific.
- Donkin L, Christensen H, Naismith S, Neal B, Hickie I, Glozier N. A systematic review of the impact of adherence on the effectiveness of e-therapies. J Med Internet Res 2011 Aug 05;13(3):e52 [FREE Full text] [CrossRef] [Medline]
- Michie S, Yardley L, West R, Patrick K, Greaves F. Developing and Evaluating Digital Interventions to Promote Behavior Change in Health and Health Care: Recommendations Resulting From an International Workshop. J Med Internet Res 2017 Jun 29;19(6):e232 [FREE Full text] [CrossRef] [Medline]
- Kelders S, Kok R, Ossebaard H, van Gemert-Pijnen JEWC. Persuasive system design does matter: a systematic review of adherence to web-based interventions. J Med Internet Res 2012 Nov 14;14(6):e152 [FREE Full text] [CrossRef] [Medline]
- Cobb NK, Graham AL, Bock BC, Papandonatos G, Abrams DB. Initial evaluation of a real-world Internet smoking cessation system. Nicotine Tob Res 2005 Apr;7(2):207-216 [FREE Full text] [CrossRef] [Medline]
- Alexander GL, McClure JB, Calvi JH, Divine GW, Stopponi MA, Rolnick SJ, et al. A randomized clinical trial evaluating online interventions to improve fruit and vegetable consumption. Am J Public Health 2010 Feb;100(2):319-326 [FREE Full text] [CrossRef] [Medline]
- Yardley L, Spring B, Riper H, Morrison L, Crane D, Curtis K, et al. Understanding and Promoting Effective Engagement With Digital Behavior Change Interventions. Am J Prev Med 2016 Nov;51(5):833-842 [FREE Full text] [CrossRef] [Medline]
- Perski O, Blandford A, West R, Michie S. Conceptualising engagement with digital behaviour change interventions: a systematic review using principles from critical interpretive synthesis. Transl Behav Med 2017 Jun;7(2):254-267 [FREE Full text] [CrossRef] [Medline]
- Sieverink F, Kelders SM, van Gemert-Pijnen JE. Clarifying the Concept of Adherence to eHealth Technology: Systematic Review on When Usage Becomes Adherence. J Med Internet Res 2017 Dec 06;19(12):e402 [FREE Full text] [CrossRef] [Medline]
- Milward J, Drummond C, Fincham-Campbell S, Deluca P. What makes online substance-use interventions engaging? A systematic review and narrative synthesis. Digit Health 2018;4:2055207617743354 [FREE Full text] [CrossRef] [Medline]
- Michie S, Richardson M, Johnston M, Abraham C, Francis J, Hardeman W, et al. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Ann Behav Med 2013 Aug;46(1):81-95. [CrossRef] [Medline]
- Bellg AJ, Borrelli B, Resnick B, Hecht J, Minicucci DS, Ory M, et al. Enhancing Treatment Fidelity in Health Behavior Change Studies: Best Practices and Recommendations From the NIH Behavior Change Consortium. Health Psychology 2004;23(5):443-451. [CrossRef]
- Csikszentmihalyi M. Flow: The psychology of optimal performance. New York: Cambridge University Press; 1990.
- Brown E, Cairns P. ACM Digital Library. 2004. A grounded investigation of game immersion. URL: https://dl.acm.org/citation.cfm?doid=985921.986048 [accessed 2019-11-12]
- Perski O, Blandford A, Garnett C, Crane D, West R, Michie S. A self-report measure of engagement with digital behavior change interventions (DBCIs): development and psychometric evaluation of the "DBCI Engagement Scale". Transl Behav Med 2019 Mar 30. [CrossRef] [Medline]
- Short CE, DeSmet A, Woods C, Williams SL, Maher C, Middelweerd A, et al. Measuring Engagement in eHealth and mHealth Behavior Change Interventions: Viewpoint of Methodologies. J Med Internet Res 2018 Nov 16;20(11):e292. [CrossRef]
- Pham Q, Graham G, Carrion C, Morita PP, Seto E, Stinson JN, et al. A Library of Analytic Indicators to Evaluate Effective Engagement with Consumer mHealth Apps for Chronic Conditions: Scoping Review. JMIR Mhealth Uhealth 2019 Jan 18;7(1):e11941. [CrossRef]
- Miller S, Ainsworth B, Yardley L, Milton A, Weal M, Smith P, et al. A Framework for Analyzing and Measuring Usage and Engagement Data (AMUsED) in Digital Interventions: Viewpoint. J Med Internet Res 2019 Feb 15;21(2):e10966. [CrossRef]
- Murray E, White IR, Varagunam M, Godfrey C, Khadjesari Z, McCambridge J. Attrition revisited: adherence and retention in a web-based alcohol trial. J Med Internet Res 2013;15(8):e162 [FREE Full text] [CrossRef] [Medline]
- Postel M, de Haan HA, ter Huurne ED, van der Palen J, Becker E, de Jong CAJ. Attrition in web-based treatment for problem drinkers. J Med Internet Res 2011 Dec 27;13(4):e117 [FREE Full text] [CrossRef] [Medline]
- Radtke T, Ostergaard M, Cooke R, Scholz U. Web-Based Alcohol Intervention: Study of Systematic Attrition of Heavy Drinkers. J Med Internet Res 2017 Jun 28;19(6):e217 [FREE Full text] [CrossRef] [Medline]
- Perski O, Lumsden J, Garnett C, Blandford A, West R, Michie S. Second Evaluation of the DBCI Engagement Scale. Open Sci Framew 2019 [FREE Full text]
- Babor T, Higgins-Biddle J, Saunders J, Monteiro M. The Alcohol Use Disorders Identification Test: Guidelines for Use in Primary Care. 2nd ed. Geneva: World Health Organization; 2001.
- Peer E, Brandimarte L, Samat S, Acquisti A. Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. Journal of Experimental Social Psychology 2017 May;70:153-163. [CrossRef]
- Paolacci G, Chandler J, Ipeirotis P. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making 2010;5(5):411-419 [FREE Full text] [Medline]
- Costello A, Osborne JW. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research & Evaluation 2005;10:173-178.
- Kotz D, Brown J, West R. Predictive validity of the Motivation To Stop Scale (MTSS): a single-item measure of motivation to stop smoking. Drug Alcohol Depend 2013 Feb 01;128(1-2):15-19 [FREE Full text] [CrossRef] [Medline]
- Hummel K, Brown J, Willemsen MC, West R, Kotz D. External validation of the Motivation To Stop Scale (MTSS): findings from the International Tobacco Control (ITC) Netherlands Survey. Eur J Public Health 2017 Feb 01;27(1):129-134. [CrossRef] [Medline]
- de Vocht F, Brown J, Beard E, Angus C, Brennan A, Michie S, et al. Temporal patterns of alcohol consumption and attempts to reduce alcohol intake in England. BMC Public Health 2016 Sep 01;16:917 [FREE Full text] [CrossRef] [Medline]
- Crane D, Garnett C, Michie S, West R, Brown J. A smartphone app to reduce excessive alcohol consumption: Identifying the effectiveness of intervention components in a factorial randomised control trial. Sci Rep 2018 Mar 12;8(1):4384 [FREE Full text] [CrossRef] [Medline]
- Google Analytics Help. 2017. How a web session is defined in Analytics. URL: https://support.google.com/analytics/answer/2731565 [accessed 2018-02-06]
- Braze Magazine. Spring 2016 Mobile Customer Retention Report. 2016. URL: https://www.braze.com/blog/app-customer-retention-spring-2016-report/ [accessed 2019-09-09]
- CISION PRWeb. 2015. Motivating Patients to Use Smartphone Health Apps. URL: http://www.prweb.com/releases/2011/04/prweb5268884.htm [accessed 2015-08-10]
- Jackson S, Marsh H. Development and validation of a scale to measure optimal experience: The Flow State Scale. J Sport Exerc Psychol 1996;18:35 [FREE Full text]
- Wiebe E, Lamb A, Hardy M, Sharek D. Measuring engagement in video game-based environments: Investigation of the User Engagement Scale. Comput Human Behav 2014;32:132. [CrossRef]
- Hinkin T. A Brief Tutorial on the Development of Measures for Use in Survey Questionnaires. Organ Res Methods 1998;1(1):121. [CrossRef]
- Stone A, Shiffman S. Ecological Momentary Assessment (Ema) in Behavioral Medicine. Annals of Behavioral Medicine 1994;16(3):199-202. [CrossRef]
- Petty R, Cacioppo J. The Elaboration Likelihood Model of Persuasion. Adv Exp Soc Psychol 1986;19:205. [CrossRef]
- Mundfrom DJ, Shaw DG, Ke TL. Minimum Sample Size Recommendations for Conducting Factor Analyses. International Journal of Testing 2005;5(2):159-168. [CrossRef]
- Berg M. Statista. 2014. iPhone Users Earn More. URL: https://www.statista.com/chart/2638/top-line-platform-stats-for-app-usage-in-the-us/ [accessed 2019-01-28]
- Palan S, Schitter C. Prolific.ac—A subject pool for online experiments. Journal of Behavioral and Experimental Finance 2018 Mar;17:22-27 [FREE Full text] [CrossRef]
- Khadjesari Z, Murray E, Kalaitzaki E, White I, McCambridge J, Thompson S, et al. Impact and costs of incentives to reduce attrition in online trials: two randomized controlled trials. J Med Internet Res 2011 Mar 02;13(1):e26 [FREE Full text] [CrossRef] [Medline]
- John OP, Donahue EM, Kentle RL. The Big Five Inventory. 1991. URL: http://www.sjdm.org/dmidi/Big_Five_Inventory.html [accessed 2019-09-09]
|AUDIT: Alcohol Use Disorders Identification Test|
|DBCI: digital behavior change intervention|
Edited by G Eysenbach; submitted 09.09.19; peer-reviewed by S Miller, J Thrul; comments to author 17.10.19; revised version received 30.10.19; accepted 11.11.19; published 20.11.19
©Olga Perski, Jim Lumsden, Claire Garnett, Ann Blandford, Robert West, Susan Michie. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 20.11.2019.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.