Estimating Determinants of Attrition in Eating Disorder Communities on Twitter: An Instrumental Variables Approach

Background The use of social media as a key health information source has increased steadily among people affected by eating disorders (EDs). Research has examined characteristics of individuals engaging in online communities, whereas little is known about discontinuation of engagement and the phenomenon of participants dropping out of these communities. Objective This study aimed to investigate the characteristics of dropout behaviors among eating disordered individuals on Twitter and to estimate the causal effects of personal emotions and social networks on dropout behaviors. Methods Using a snowball sampling method, we collected a set of individuals who self-identified with EDs in their Twitter profile descriptions, as well as their tweets and social networks, leading to 241,243,043 tweets from 208,063 users. Individuals’ emotions are measured from their language use in tweets using an automatic sentiment analysis tool, and network centralities are measured from users’ following networks. Dropout statuses of users are observed in a follow-up period 1.5 years later (from February 11, 2016 to August 17, 2017). Linear and survival regression instrumental variables models are used to estimate the effects of emotions and network centrality on dropout behaviors. The average levels of attributes among an individual’s followees (ie, people who are followed by the individual) are used as instruments for the individual’s attributes. Results Eating disordered users have relatively short periods of activity on Twitter with one half of our sample dropping out at 6 months after account creation. Active users show more negative emotions and higher network centralities than dropped-out users. Active users tend to connect to other active users, whereas dropped-out users tend to cluster together. Estimation results suggest that users’ emotions and network centralities have causal effects on their dropout behaviors on Twitter. More specifically, users with positive emotions are more likely to drop out and have shorter lasting periods of activity online than users with negative emotions, whereas central users in a social network have longer lasting participation than peripheral users. Findings on users’ tweeting interests further show that users who attempt to recover from EDs are more likely to drop out than those who promote EDs as a lifestyle choice. Conclusions Presence in online communities is strongly determined by the individual’s emotions and social networks, suggesting that studies analyzing and trying to draw condition and population characteristics through online health communities are likely to be biased. Future research needs to examine in more detail the links between individual characteristics and participation patterns if better understanding of the entire population is to be achieved. At the same time, such attrition dynamics need to be acknowledged and controlled when designing online interventions so as to accurately capture their intended populations.


Data Statistics
shows a diagram of our data collection and analysis processes. Table S1 shows descriptive statistics of users stratified by dropout states that are observed in our second observation period. The differences between dropouts and non-dropouts are measured with the Mann-Whitney Utest. The U-test is a nonparametric test with the null hypothesis that the distributions of two populations are equal. This test does not need the assumption of a specific distribution in data (e.g., a normal distribution in the t-test), well suitable for statistics on social media that often follow a non-normal (e.g., power law) distribution [1]. For intuitive comparisons, we report a standardized U as a z-score. Moreover, the Bonferroni correction is used to counteract the problem of multiple comparisons. Compared to dropouts, non-dropouts show more negative emotions and higher network centralities. The network centralities are measured based on a following network containing 208,063 nodes and 1,347,056 directed edges. All nodes are connected in a single weakly connected component and the average degree of the network is 6.5. Figure S2 shows details on dates when users joined and dropped out on Twitter. Most users were active during 2012 to 2014, during which 1,944 users (67%) joined Twitter. Two notable peaks in the curve of last posting time occur at the dates of our two observations. The first peak indicates that some users were lost to follow up (e.g., accounts were deleted), and the second peak indicates Figure S1: Diagram of data collection and analysis procedures. that many users were still actively posting tweets until our observations ended. Figure S3 shows demographic information of ED users, extracted from users' Twitter profile descriptions using regular expressions. To avoid noise (i.e., extremely small or large values), we only consider users whose values are in the 95% confidence intervals of the whole distribution of a statistic (except for gender). We obtain 357 users who self-reported gender information and 84% of these users (n = 300) are female (see Figure S3(a)). There are 1,030 users who reported their ages in total. After excluding those with extremely small and large values of age, Figure S3(b) shows the distribution of age among ED users (n = 1, 015), and the mean age of these users is 17.3. The majority of females and young ages in the ED users align with clinical evidence that ED are often developed among young females [2,3]. Figures S3(c) and (d) further show the distributions of height (with the mean of µ = 165.1cm) and weights (µ = 57.6kg for current weight and µ = 49.4kg for goal weight) among ED users. Comparing the distributions of weights, we see that the values of goal weights are smaller than those of current weights. This implies that most ED users attempt to lose weight. Moreover, we calculate BMI (Body Mass Index) for users who reported information about both height and weight. Figure S3(e) shows the distributions of users' current (µ = 21.1) and goal (µ = 18.4) BMIs. Compared to the reference values of BMI for girls at the age of 17.3 years from World Health Organization (WHO) [4], we find that 58% of users (n = 574 among 991 users) have a current BMI lower than 21.1 (the reference value of median BMI), and 55% of users (n = 619 among 1,123 users) have a goad BMI lower than 18.5 (the reference BMI for underweight). Note that we do not include users' demographic attributes in our estimation models due to a low fraction of users who have these attributes (e.g., only 357 of 3,380 users with gender information).

Null Model
We use a null model [5] to test the statistical significance of the homophily pattern. Specifically, we randomly shuffle users' dropout states and re-measure homophily coefficients r [6] based on the shuffled states. These coefficients can be viewed as observed values of a random variable. Repeating this procedure 3,000 times, we yield the empirical distribution of homophily coefficients with a mean of µ = 0 and a standard deviation of σ = 0.005. The z-score for the observed homophily (i.e., r = 0.09 in the main text) under this baseline distribution is z = 16.84 and

Specifications of Data Censoring Methods
We tune the parameters of our data censoring methods based on users' activities before and after our first observation. We apply each censoring method with different parameters to data on users' activities before our first observation to estimate users' dropout states, and choose parameters that achieve the best agreement between the estimated dropout states and the observed states in our second observation. By setting π P [1, 300] days in the identical-interval censoring method, we find the optimal parameter being π = 101, with Cohen's κ = .68 of the estimated dropout states and the observed dropout states of users. Such good agreement illustrates the effectiveness of the censoring approach. Similarly, by searching in a parameter space of π P [0, 200] days and λ P [0, 1], we find the optimal parameters in the personalized-interval censoring method being π = 161 and λ = 0.6, with Cohen's κ = .68 as well. We use these parameters censor the dropout states of users who were active in the second observation. Figure S4 shows the Kaplan-Meier curves of users' survival time from our first observation until our second observation using the two censoring methods. The median survival time of users is 13 months in both methods, and no significant difference is found between the two types of censorships (P = .93 in a log-rank test).

Posting Interests of Users
To better understand the relationship between emotions and dropout, we examine associations of interests among users with different dropout statuses and emotional states. This follows past evidence that community interest is the primary motivating factor for participation in online communities [7] and people's concerns/interests reflect their emotional states [8]. Since hashtags are explicit topic signals on Twitter and have been shown to strongly indicate users' interests [9], we characterize users' interests based on hashtags used in their tweets.  Figure S5: The co-occurrence network of the most popular hashtags used by all ED users. Each node is a hashtag, and node size is proportional to the number of users who posted the tag. Node color is assigned based on the frequency of a tag so that high frequency is darker and low frequency is lighter. Edge width is proportional to the number of co-occurrences of two attached tags in tweets.
We first examine the prevalent topics of interest for the entire ED community. To capture relationships between different topics, we build an undirected, weighted hashtag network based on the co-occurrences of hashtags in tweets posted by ED users, where an edge is weighted by the co-occurrence count of two attached tags. To filter out noise from accidental co-occurrences and spam, we only consider hashtags used by more than 50 distinct users and observed in more than 50 tweets, resulting in a network of 312 nodes and 7,906 edges. Figure S5 shows the co-occurrence network of the most popular hashtags of interest for ED users. We observe that topics on promoting a thin ideal (e.g., "thinspo" and "thinspiration") are very prevalent in the community.
We then examine interests of users with different dropout states. We split ED users into two sets based on their dropout states in our second observation, and extract hashtags from tweets posted by each set of users. Again, tags that are used by less than 50 users and occur in less than Table S2: The most popular hashtags used by ED users, grouped by users' dropout states.

Dropout states
Hashtags a Non-dropout legspo, mythinspo, skinny4xmas, bonespo, goals, edlogic, eatingdisorders, edthoughts, ribs, bones, depressed, depression, edprobs, collarbones, bulimia, promia, replytweet, beautiful, anorexia, thin, hipbones, legs, ednos, ed, thighgap, weightloss, skinny, proed, selfharm, perfection, mia, thinspiration, perfect, proana, diet, eatingdisorder Dropout goaway, stopbullying, worthless, selfharmprobz, ew, anasisters, yay, oneday, reasonstobefit, bulimicprobz, anorexicprobz, fact, disgusting, thankgod, willpower, tweetwhatyoueat, wow, toofat, jealous, thankyou, true, anasister, anafamily, starveon, gross, teamfollowback, fuck, icandothis, tired, edfamily, relapse, stayingstrong a Tags for each state are ranked in a decreasing order based on the TF-IDF score of a tag, which is calculated by the ratio of the number of users who post the tag and have a given dropout state to the number of users who post the tag in the whole user sample (i.e., regardless of users' dropout states). 50 tweets in each set are excluded. To adjust tags that are popular in general, we use TF-IDF [10] to rank the specificity of a tag in each set of users. Table S2 lists the most representative hashtags in each user set, in which we find that users with different dropout states display distinct interests online. Non-dropouts are interested in advocating a thin ideal (e.g., "mythinspo" and "skinny4xmas") and reinforcing a pro-ED identity (e.g., "edlogic" and "beautiful"). In contrast, dropouts engage more in discussing their health problems (e.g., "selfharmprobz", "bulimicprobz", "anorexicprobz" and "relapse") and offering emotional support for others (e.g., "anasisters" and "stayingstrong"), which implies a tendency of these users to recover from disorders [11,12,13]. Together, these results imply that pro-recovery users are more likely to drop out than pro-ED users. A comparison of interests between each individual set and the entire community (see Figure S5) further shows that the non-dropouts have dominated the topics of discussions within the community. This is expected because the non-dropouts have prolonged participation, with 732.17 active days on average compared to 278.40 days of the dropouts (see Table S1). Similarly, we split ED users into three equal-size sets based on their emotional scores and obtain the most representative hashtags among each set of users in Table S3. The results show that users with negative emotions more engage in promoting thin ideals (e.g., "bonespo" and "mythinspo"), showing largely overlapping interests with the non-dropouts. In contrast, users with neutral and positive emotions are more interested in discussing their health problems (e.g., "anorexicprobz" and "bulimicprobz"), opposing pro-ED promotions (e.g., "reversethinspo") and encouraging healthier body image and behaviors (e.g., "fitfam" and "fitness"), showing similar interests with the dropouts.
To further quantify the similarity (or association) of posting interests between users with a dropout state and those with a emotional state (as identified in Tables S2 and S3), we measure the Spearman rank correlation r between pairwise lists of hashtags posted by users with a given state (e.g., dropped-out or not, and positive or negative). We use the Spearman correlation because (i) it is more robust to scaling of data than other measures (e.g., cosine similarity) and (ii) it does not assume that datasets follow a specific distribution (e.g., a normal distribution in the Pearson correlation). The results of correlations are shown in Table 4 of the main text.