This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
SMS text messaging is an inexpensive, private, and scalable technology-mediated assessment mode that can alleviate many barriers faced by the safety net population to receive depression screening. Some existing studies suggest that technology-mediated assessment encourages self-disclosure of sensitive health information such as depressive symptoms while other studies show the opposite effect.
This study aimed to evaluate the validity of using SMS text messaging to screen depression and related conditions, including anxiety and functional disability, in a low-income, culturally diverse safety net primary care population.
This study used a randomized design with 4 study groups that permuted the order of SMS text messaging and the gold standard interview (INTW) assessment. The participants for this study were recruited from the participants of the prior Diabetes-Depression Care-management Adoption Trial (DCAT). Depression was screened by using the 2-item and 8-item Patient Health Questionnaire (PHQ-2 and PHQ-8, respectively). Anxiety was screened by using the 2-item Generalized Anxiety Disorder scale (GAD-2), and functional disability was assessed by using the Sheehan Disability Scale (SDS). Participants chose to take up the assessment in English or Spanish. Internal consistency and test-retest reliability were evaluated by using Cronbach alpha and intraclass correlation coefficient (ICC), respectively. Concordance was evaluated by using an ICC, a kappa statistic, an area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. A regression analysis was conducted to examine the association between the participant characteristics and the differences in the scores between the SMS text messaging and INTW assessment modes.
Overall, 206 participants (average age 57.1 [SD 9.18] years; females: 119/206, 57.8%) were enrolled. All measurements except the SMS text messaging–assessed PHQ-2 showed Cronbach alpha values ≥.70, indicating acceptable to good internal consistency. All measurements except the INTW-assessed SDS had ICC values ≥0.75, indicating good to excellent test-retest reliability. For concordance, the PHQ-8 had an ICC of 0.73 and AUROC of 0.93, indicating good concordance. The kappa statistic, sensitivity, and specificity for major depression (PHQ-8 ≥8) were 0.43, 0.60, and 0.86, respectively. The concordance of the shorter PHQ-2, GAD-2, and SDS scales was poor to fair. The regression analysis revealed that a higher level of personal depression stigma was associated with reporting higher SMS text messaging–assessed PHQ-8 and GAD-2 scores than the INTW-assessed scores. The analysis also determined that the differences in the scores were associated with marital status and personality traits.
Depression screening conducted using the longer PHQ-8 scale via SMS text messaging demonstrated good internal consistency, test-retest reliability, and concordance with the gold standard INTW assessment mode. However, care must be taken when deploying shorter scales via SMS text messaging. Further regression analysis supported that a technology-mediated assessment, such as SMS text messaging, may create a private space with less pressure from the personal depression stigma and therefore encourage self-disclosure of depressive symptoms.
ClinicalTrials.gov NCT01781013; https://clinicaltrials.gov/ct2/show/NCT01781013
RR2-10.2196/12392
Depression is an underdiagnosed comorbidity that can negatively affect functional status, morbidity/mortality, and cost for the treatment of chronic illnesses, such as diabetes [
Nevertheless, there are significant barriers for adopting mass depression screening, particularly in underserved, predominantly minority patients with chronic illnesses. This patient population has an increased risk of depression and often prefers safety net primary care over specialty psychiatric care when seeking mental health care [
The increasing usage of mobile services, particularly SMS text messaging, provides opportunities to overcome the barriers for adopting universal depression screening in underserved populations. The use of SMS text messaging is highly prevalent globally; among the 4 billion mobile phones in use, 3.05 billion (75%) are SMS text messaging–enabled [
Previous studies have tested the validity of conducting standardized depression screening, such as the Patient Health Questionnaire (PHQ), by using paper-based self-reported assessment [
To fill in this knowledge gap, this study examined the validity of using standardized tools to assess depression and its related conditions via SMS text messaging vs the gold standard INTW assessment in underserved, predominantly minority patients from a large safety net primary care system. This study examined the internal consistency, test-retest reliability, and concordance of the 2 modes of assessment. Patient characteristics, including demographics such as age, gender, race/ethnicity, and marital status; technology use; and psychological traits such as personality, cognitive vulnerability of depression, and depression stigma were further examined in a regression analysis to explore their correlations with the differences in the 2 modes of assessment.
This study protocol was approved by the Institutional Review Board of the University of Southern California and has been published in
The SMS text messaging/INTW and INTW/SMS text messaging groups were used to examine the concordance between the SMS text messaging and INTW assessments. The SMS text messaging/SMS text messaging and INTW/INTW groups were used to evaluate test-retest reliability. Validity of the INTW assessment has been established in prior studies [
As described in the study protocol paper [
The depression screening was conducted using the 2-item and 8-item PHQ (PHQ-2 and PHQ-8, respectively), which are widely used depression screening tools in primary care and general populations [
Participant characteristics included demographics (such as age, gender, race/ethnicity, language, marital status, and education), personality, cognitive diathesis to depression, depression stigma, and mobile phone use. Personality was measured by using the Ten-Item Personality measure of the Big Five personality scale: extraversion, agreeableness, conscientiousness, emotional stability, and openness to experience [
The participant characteristics were summarized using mean and standard deviation for continuous variables and frequency and percentage for dichotomous variables. The internal consistency was evaluated by using Cronbach alpha. The test-retest reliability of the SMS text messaging and INTW assessments was evaluated by using ICC. The concordance between the SMS text messaging and INTW assessments was evaluated by using ICC, a kappa statistic, an area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. ICC was used to measure the consistency or reproducibility of the SMS text messaging and INTW assessments. AUROC, sensitivity, and specificity were used to measure discriminative validity. The kappa statistic was used to measure interrater agreement. The kappa statistic, sensitivity, and specificity were computed using the threshold levels of PHQ-2 ≥3, PHQ-8 ≥8, GAD-2 ≥3, and SDS ≥12. The differences in the scores between the SMS text messaging and INTW assessments were summarized by using means and standard deviations. The differences were detected using a paired 2-tailed
A regression analysis was conducted to further examine the associations between the participant characteristics and the differences in the scores between the SMS text messaging and INTW assessments. To identify the most predictive variables, all patient characteristics, as summarized in
All statistical analyses were conducted using R, version 3.5.2 (R Core team) [
Summary of the participant characteristics.
Variable | All (N=206) | SMS text messaging/INTWa (n=52) | SMS text messaging/SMS text messaging (n=53) | INTW/SMS text messaging (n=49) | INTW/INTW (n=52) | |
Age (years), mean (SD) | 57.11 (9.18) | 58.54 (8.60) | 55.35 (10.06) | 57.24 (8.08) | 57.33 (9.76) | |
Female, n (%) | 119 (57.8) | 33 (63.5) | 34 (64.2) | 26 (53.1) | 26 (50.0) | |
Latino, n (%) | 192 (93.2) | 50 (96.2) | 51 (96.2) | 44 (91.7) | 47 (92.2) | |
Preferred Spanish language, n (%) | 160 (77.7) | 39 (75.0) | 47 (88.7) | 38 (77.6) | 36 (69.2) | |
Less than high-school level education, n (%) | 131 (63.6) | 31 (59.6) | 33 (62.3) | 35 (71.4) | 32 (61.5) | |
Extraversion score, mean (SD) | 3.84 (1.15) | 3.84 (1.23) | 4.03 (1.00) | 3.68 (1.05) | 3.81 (1.28) | |
Agreeableness score, mean (SD) | 6.43 (0.87) | 6.45 (0.84) | 6.75 (0.53) | 6.17 (0.95) | 6.31 (1.01) | |
Conscientiousness score, mean (SD) | 5.51 (1.48) | 5.60 (1.24) | 5.52 (1.73) | 5.36 (1.56) | 5.55 (1.37) | |
Emotional stability score, mean (SD) | 5.47 (1.44) | 5.37 (1.47) | 5.47 (1.48) | 5.43 (1.42) | 5.61 (1.43) | |
Openness to experience score, mean (SD) | 3.77 (1.23) | 3.61 (1.38) | 3.90 (1.15) | 3.72 (1.24) | 3.83 (1.17) | |
Dysfunctional attitude scale, mean (SD) | 0.58 (0.70) | 0.55 (0.65) | 0.65 (0.76) | 0.50 (0.66) | 0.61 (0.74) | |
Personal depression stigma, mean (SD) | 2.16 (1.08) | 2.00 (1.12) | 2.31 (1.11) | 2.13 (0.96) | 2.21 (1.13) | |
Perceived depression stigma, mean (SD) | 3.20 (0.91) | 3.07 (1.02) | 3.40 (0.73) | 3.14 (0.99) | 3.16 (0.86) | |
|
131 (63.6) | 33 (63.5) | 39 (73.6) | 33 (67.3) | 26 (50.0) | |
|
0 | 3 (1.5) | 0 (0.0) | 0 (0.0) | 1 (2.1) | 2 (3.8) |
|
1 | 13 (6.4) | 3 (5.8) | 1 (1.9) | 1 (2.1) | 8 (15.4) |
|
2 | 56 (27.6) | 16 (30.8) | 12 (23.1) | 12 (25.5) | 16 (30.8) |
|
3 | 5 (2.5) | 0 (0.0) | 4 (7.7) | 0 (0.0) | 1 (1.9) |
|
4 | 126 (62.1) | 33 (63.5) | 35 (67.3) | 33 (70.2) | 25 (48.1) |
|
86 (41.7) | 22 (42.3) | 26 (49.1) | 22 (44.9) | 16 (30.8) | |
|
0 purposes | 22 (10.8) | 6 (11.5) | 2 (3.8) | 4 (8.3) | 10 (19.2) |
|
1 purpose | 96 (47.1) | 24 (46.2) | 24 (46.2) | 22 (45.8) | 26 (50.0) |
|
2 purposes | 54 (26.5) | 14 (26.9) | 17 (32.7) | 14 (29.2) | 9 (17.3) |
|
3 purposes | 32 (15.7) | 8 (15.4) | 9 (17.3) | 8 (16.7) | 7 (13.5) |
aINTW: interviewer.
Participants were recruited from June 2017 to November 2017, which led to the enrollment of 206 participants: 52 in the SMS text messaging/INTW, 53 in the SMS text messaging/SMS text messaging, 49 in the INTW/SMS text messaging, and 52 in the INTW/INTW groups. The average age of the participants was 57.1 years, 57.8% (119/206) were females, and 93.2% (192/206) were Latinos. In addition, 77.7% (160/206) chose Spanish as their preferred language. Compared with the personality norms from a large sample [
The internal consistency and test-retest reliability of the INTW and SMS text messaging assessments were evaluated by using Cronbach alpha and ICC, respectively. As shown in
Internal consistency and test-retest reliability of the interviewer and SMS text messaging assessments.
Assessment mode | Internal consistency (Cronbach alpha) | Test-retest reliability (intraclass correlation coefficient) | ||||
|
||||||
|
|
|
|
|||
|
|
PHQ-2a | .71 | 0.76 | ||
|
|
PHQ-8b | .86 | 0.78 | ||
|
Anxiety (GAD-2c) | .82 | 0.75 | |||
|
Functional disability (SDSd) | .80 | 0.47 | |||
|
||||||
|
|
|
|
|||
|
|
PHQ-2 | .68 | 0.74 | ||
|
|
PHQ-8 | .86 | 0.81 | ||
|
Anxiety (GAD-2) | .71 | 0.73 | |||
|
Functional disability (SDS) | .86 | 0.82 |
aPHQ-2: 2-item Patient Health Questionnaire.
bPHQ-8: 8-item Patient Health Questionnaire.
cGAD-2: 2-item Generalized Anxiety Disorder scale.
dSDS: Sheehan Disability Scale.
Concordance between the interviewer and SMS text messaging assessments.
Measurement | Interviewer assessment, mean (SD) | SMS text messaging assessment, mean (SD) | Intraclass correlation coefficient | Kappa valueb | Area under the receiver operating characteristic curve | Sensitivityb | Specificityb | |||
|
|
|||||||||
|
Patient Health Questionnaire (2-item) | 0.67 (1.27) | 1.23 (1.79) | .13 | 0.32 | 0.19 | 0.84 | 0.34 | 0.89 | |
|
Patient Health Questionnaire (8-item) | 3.29 (4.47) | 3.89 (4.20) | .39 | 0.73 | 0.43 | 0.93 | 0.60 | 0.86 | |
Anxiety (2-item Generalized Anxiety Disorder scale) | 0.97 (1.49) | 1.16 (1.63) | .64 | 0.54 | 0.35 | 0.76 | 0.50 | 0.89 | ||
Functional disability (Sheehan Disability Scale) | 8.09 (6.40) | 6.83 (8.03) | .16 | 0.54 | 0.13 | 0.94 | 0.59 | 1.00 |
a
bThe kappa statistic, sensitivity, and specificity were evaluated using a cutoff point of 3 for the 2-item Patient Health Questionnaire and 2-item Generalized Anxiety Disorder scale, 8 for the 8-item Patient Health Questionnaire, and 12 for the Sheehan Disability Scale.
A regression analysis was performed to further examine the associations between the participant characteristics and the differences in the INTW and SMS text messaging assessment scores.
Linear regression analysis using the top 4 predictors selected by least absolute shrinkage and selection operator to predict the differences between the interviewer and SMS text messaging assessments.
Predictors | Difference between interviewer and SMS text messaging assessments, estimate of coefficient (95% CI) | |||
|
Patient Health Questionnaire (2-item)a | Patient Health Questionnaire (8-item)b | Generalized Anxiety Disorder scale (2-item)c | Sheehan Disability Scaled |
Conscientiousness score ≤4.5 | 1.76 (0.58 to 2.94)e | 2.39 (0.27 to 4.51)e | 1.09 (0.09 to 2.05)e | −3.75 (−8.57 to 1.07) |
Emotional stability score ≤4.5 | −1.45 (−2.54 to −0.36)e | —f | −1.09 (−2.04 to −0.14)e | — |
Agreeable score=7 | 1.33 (0.17 to 2.49)e | 2.35 (0.38 to 0.32)e | — | 2.74 (−1.88 to 7.36) |
Openness to experience score ≥4.5 | — | — | — | 5.51 (0.50 to 10.51)e |
Personal depression stigma | — | −0.94 (−1.87 to −0.10)e | −0.50 (−0.98 to −0.02)e | — |
Dysfunctional attitude score | — | — | −0.36 (−1.14 to 0.42) | — |
Married | — | −2.37 (−4.39 to −0.34)e | — | — |
Gender | 0.62 (−0.50 to 1.74) | — | — | 1.76 (−2.75 to 6.26) |
a
b
c
d
e
fSome cells are empty because the corresponding variables are not selected into the regression model.
This study examined the validity of screening depression and related comorbid conditions, including anxiety and functional disability via the SMS text messaging and INTW assessments for underserved, predominantly minority safety net primary care patients. Although the longer PHQ-8 depression screening scale had good internal consistency, test-retest reliability, and concordance, the 3 shorter scales, ie, the PHQ-2, GAD-2, and SDS, had poor-to-moderate levels of concordance between the SMS text messaging and INTW assessments. In particular, the PHQ-2 depression screening scale had poor concordance, as measured by ICC and Cohen kappa, between the SMS text messaging and INTW assessments. The kappa value of the SDS also indicated poor agreement. The interrater agreement as measured using Cohen kappa would improve if different cutoff points were assigned based on the modes of assessment. The kappa value for the PHQ-2 depression screening scale would improve from 0.19 (indicating poor agreement) to 0.52 (indicating moderate agreement) if the cutoff points were changed from 3 for both modes of assessment to 2 for the INTW assessment and to 3 for the SMS text messaging assessment. Similarly, the kappa value for the SDS would improve from 0.13 (indicating poor agreement) to 0.49 (indicating moderate agreement) if the cutoff points were changed from 12 for both modes of assessment to a cutoff point of 12 for the INTW assessment and 9 for the SMS text messaging assessment.
This study found that participants reported more symptoms of depression and anxiety via the SMS text messaging assessment than the INTW assessment. In contrast, less functional disability was reported via the SMS text messaging assessment than the INTW assessment. The regression analysis revealed that a higher level of personal depression stigma was associated with reporting more symptoms of depression and anxiety via the SMS text messaging assessment than the INTW assessment. This finding supports the hypothesis that SMS text messaging creates a private and secure environment with less social desirability bias and therefore encourages people to self-report stigmatized symptoms of depression and anxiety [
This study had a few limitations that should be discussed. First, the study participants’ experience built in the prior DCAT study may have made those participants more familiar with technology-mediated assessments than the average person in the targeted study population. Nevertheless, the 4-year interval between the DCAT study (conducted during 2010-2013) and this study (conducted in 2017) was not short and thus is likely to decrease the potential influence of the DCAT assessment. Second, the study participants were predominantly Latinos, which may limit the generalizability of the results to other safety net primary care populations, particularly those of African American patients. Finally, the statistical associations revealed by the regression analysis need further exploration for the causal mechanism underlying self-reporting sensitive health information via different modes of assessment.
This study examined the validity of screening depression and related conditions via an SMS text messaging vs interview assessment for underserved, predominantly minority safety net primary care patients. The depression screening conducted using the longer PHQ-8 scale via SMS text messaging demonstrated good internal consistency, test-retest reliability, and concordance with the gold standard INTW assessment mode. Deploying shorter scales via SMS text messaging should be done cautiously. A further regression analysis supported that technology-mediated assessments, such as SMS text messaging, may create a private space with less pressure from personal depression stigma and therefore encourage self-disclosure of depressive symptoms. Other characteristics such as personality traits and certain demographic characteristics were also associated with the difference between technology-mediated and INTW assessment modes.
area under the receiver operating characteristic curve
Dysfunctional Attitudes Scale
Diabetes-Depression Care-management Adoption Trial
Depression Stigma Scale
Generalized Anxiety Disorder
intraclass correlation coefficient
interviewer
least absolute shrinkage and selection operator
Patient Health Questionnaire
Sheehan Disability Scale
The Suzanne Dworak-Peck School of Social Work at the University of Southern California funded this study.
None declared.