Depression Screening Using Daily Mental-Health Ratings from a Smartphone Application for Breast Cancer Patients

Background Mobile mental-health trackers are mobile phone apps that gather self-reported mental-health ratings from users. They have received great attention from clinicians as tools to screen for depression in individual patients. While several apps that ask simple questions using face emoticons have been developed, there has been no study examining the validity of their screening performance. Objective In this study, we (1) evaluate the potential of a mobile mental-health tracker that uses three daily mental-health ratings (sleep satisfaction, mood, and anxiety) as indicators for depression, (2) discuss three approaches to data processing (ratio, average, and frequency) for generating indicator variables, and (3) examine the impact of adherence on reporting using a mobile mental-health tracker and accuracy in depression screening. Methods We analyzed 5792 sets of daily mental-health ratings collected from 78 breast cancer patients over a 48-week period. Using the Patient Health Questionnaire-9 (PHQ-9) as the measure of true depression status, we conducted a random-effect logistic panel regression and receiver operating characteristic (ROC) analysis to evaluate the screening performance of the mobile mental-health tracker. In addition, we classified patients into two subgroups based on their adherence level (higher adherence and lower adherence) using a k-means clustering algorithm and compared the screening accuracy between the two groups. Results With the ratio approach, the area under the ROC curve (AUC) is 0.8012, indicating that the performance of depression screening using daily mental-health ratings gathered via mobile mental-health trackers is comparable to the results of PHQ-9 tests. Also, the AUC is significantly higher (P=.002) for the higher adherence group (AUC=0.8524) than for the lower adherence group (AUC=0.7234). This result shows that adherence to self-reporting is associated with a higher accuracy of depression screening. Conclusions Our results support the potential of a mobile mental-health tracker as a tool for screening for depression in practice. Also, this study provides clinicians with a guideline for generating indicator variables from daily mental-health ratings. Furthermore, our results provide empirical evidence for the critical role of adherence to self-reporting, which represents crucial information for both doctors and patients.


Random-Effects Logistic Panel Regression Models
A logistic regression is a regression model in which the dependent variable is binary. The model is used to estimate the probability of the dependent variable being one based on explanatory variables.
A logistic panel regression is used when it deals with cross-section, time-series panel data collected from the same individuals over time. A panel analysis should account for individual heterogeneity because individual characteristics may affect the estimation result.
One of the approaches to deal with individual heterogeneity is to employ a fixed-effect model. A fixed-effect model deals with this issue by estimating all individual intercept terms 1 for each individual i. Thus, it is inefficient to estimate all intercept terms when there are relatively many individuals. Also, observations are excluded if there is no variation in variables (i.e., time-invariant variables, such as gender) because the model estimates coefficients by using only time-variant variables within an individual.
The other approach is to use a random effect model. A random effect model assumes a random sampling process and considers 1 to include two components: 1 and ( 1 = 1 + ). Here, 1 denotes the average expected odds ratio when other covariates take zero values, and represents an individual's random error (individual heterogeneity) deviated from the population. A random effect model does not require that all individual intercept terms be estimated. Also, time-invariant covariates can be estimated. A random effect logistic regression is formally specified as below: In our model, indicates an individual i' depression state (i.e., 1 = depressed and 0 = normal) at time t, and the term on the left-hand side is the log odds ratio that a patient is depressed ( =1).
On the right-hand side, 1 denotes an intercept term, which is the average expected odds ratio.
is an observation of the k-th covariate (i.e., daily logs in our modelsleep satisfaction, mood, anxiety) for an individual i at time t, and is a coefficient of the corresponding covariate.
represents unobserved individual heterogeneity, which is considered to be random. In other words, a random-effect model assumes zero mean, independency between each individual, and a constant variance of ( E( ) = 0, cov( , ) = 0, var( ) = 2 ).
indicates the idiosyncratic errors that change across t as well as across i.
Readers can run a random effect logistic regression model with most statistical software, such as Stata, R, and SAS. We ran our model with Stata by using command "xtlogit" with an option "re". "xtlogit" is a command to run a logistic regression model for panel data, and "re" is the option for a random effect model. For further information, we recommend four sources listed below.

Wooldridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data.
Cambridge, MA: MIT Press, 2010. 2. Conaway, Mark R. "A Random Effects Model for Binary Data." Biometrics (1990): 317-28. 3. Stata manual "xtlogit". http://www.stata.com/manuals13/xtxtlogit.pdf k-Means Clustering Algorithm k-means clustering is a method to classify subjects into homogeneous subgroups where each observation belongs to the cluster with the nearest intracluster distance and with the largest intercluster distance. k-mean clustering partitions n observations into k heterogeneous subsets (clusters) to minimize the intracluster sum of squares.
Here, represents the ℎ dimensional vector and μ is the mean of points in S .
To help readers understand how the k-means clustering algorithm works, a simple graphic example is illustrated below with the assumption that each observation is a twodimensional.

Figure 1. The procedure of k-means clustering
First, the k-means clustering algorithm classifies subgroups by randomly assigning to k sets (A in figure 1). We chose three clusters in the example, but researchers can select any number (k) of clusters. Second, the algorithm calculates the mean of clusters as the centroids of the data points, the point that minimizes the within-cluster sum of squares (B in figure 1). Third, data points are reassigned to clusters of which the distance between the center of a cluster and a data point is the shortest (C in figure 1). Fourth, the mean of each cluster is recalculated (D in figure 1). Third and fourth steps (C and D in figure 1) are repeated until there is no further change.
Most statistical software, such as Stata, R, and SAS, provide k-means clustering algorithm modules. Stata users may run k-means clustering analysis by using the command "cluster kmeans". For further information about k-means clustering, we recommend the references listed below.