Stroke Survivors on Twitter: Sentiment and Topic Analysis From a Gender Perspective

Background Stroke is the worldwide leading cause of long-term disabilities. Women experience more activity limitations, worse health-related quality of life, and more poststroke depression than men. Twitter is increasingly used by individuals to broadcast their day-to-day happenings, providing unobtrusive access to samples of spontaneously expressed opinions on all types of topics and emotions. Objective This study aimed to consider the raw frequencies of words in the collection of tweets posted by a sample of stroke survivors and to compare the posts by gender of the survivor for 8 basic emotions (anger, fear, anticipation, surprise, joy, sadness, trust and disgust); determine the proportion of each emotion in the collection of tweets and statistically compare each of them by gender of the survivor; extract the main topics (represented as sets of words) that occur in the collection of tweets, relative to each gender; and assign happiness scores to tweets and topics (using a well-established tool) and compare them by gender of the survivor. Methods We performed sentiment analysis based on a state-of-the-art lexicon (National Research Council) with syuzhet R package. The emotion scores for men and women were first subjected to an F-test and then to a Wilcoxon rank sum test. We extended the emotional analysis, assigning happiness scores with the hedonometer (a tool specifically designed considering Twitter inputs). We calculated daily happiness average scores for all tweets. We created a term map for an exploratory clustering analysis using VosViewer software. We performed structural topic modelling with stm R package, allowing us to identify main topics by gender. We assigned happiness scores to all the words defining the main identified topics and compared them by gender. Results We analyzed 800,424 tweets posted from August 1, 2007 to December 1, 2018, by 479 stroke survivors: Women (n=244) posted 396,898 tweets, and men (n=235) posted 403,526 tweets. The stroke survivor condition and gender as well as membership in at least 3 stroke-specific Twitter lists of active users were manually verified for all 479 participants. Their total number of tweets since 2007 was 5,257,433; therefore, we analyzed the most recent 15.2% of all their tweets. Positive emotions (anticipation, trust, and joy) were significantly higher (P<.001) in women, while negative emotions (disgust, fear, and sadness) were significantly higher (P<.001) in men in the analysis of raw frequencies and proportion of emotions. Happiness mean scores throughout the considered period show higher levels of happiness in women. We calculated the top 20 topics (with percentages and CIs) more likely addressed by gender and found that women’s topics show higher levels of happiness scores. Conclusions We applied two different approaches—the Plutchik model and hedonometer tool—to a sample of stroke survivors’ tweets. We conclude that women express positive emotions and happiness much more than men.

geographic locations, obtained by means of rtweet library. . We were able to identify the geographic locations of 378 of the 479 users (78.91%). As shown in Table A0, most of the users are from 4 countries, 356 of the 378 users (95%) are from Australia (AU), Canada(CA), United Kingdom (UK) or United States (US).
United States is the country with more users: 206 of the 378, therefore 55% of the users are from United States. The United Kingdom with 113 users (29.89%) is the second country in representation, therefore both countries provide more than 85% of participants. Wordclouds from participants profiles self-description Figure A2. Wordclouds of the top 500 words in users profiles description, men top women bottom

VosViewer Cluster Analysis
As initial exploration of topics analysis, we present a visualization where words that appear together in the text are shown together and words that appear more frequently in the text are highlighted from the others. A term map is a twodimensional representation of a field in which strongly related terms are located close to each other and less strongly related terms are located further away from each other. While several programs are available for analyzing text units and similarity matrices, the emphasis of VOSviewer is visualization (VOS stands for Visualization of Similarities). It is argued that the VOS mapping technique yields more satisfactory maps than popular multidimensional-scaling-based approaches. Maps constructed using these multidimensional scaling-based approaches are shown to suffer from certain artifacts. Maps constructed using the VOS mapping technique do not have this problem (Waltman et al 2010).
Tweets are tokenized into terms, terms are assigned to clusters by maximizing a quality function. The quality function is a variant of the modularity function of Newman and Girvan (2004) and Newman (2004) developed in the field of network science, based in the Potts model. In order to maximize the quality function, VosViewer clustering technique uses the smart local moving algorithm introduced by Waltman and Van Eck (2013). The local moving heuristic repeatedly moves individual nodes from one community to another in such a way that each node movement results in a modularity increase. The local moving heuristic iterates over the nodes in a network in a random order. For each node, it is determined whether it is possible to increase modularity by moving the node from its current community to a different (possibly empty) community. If increasing modularity is indeed possible, the node is moved to the community that results in the largest modularity gain. The local moving heuristic keeps moving nodes until a situation is reached in which there are no further possibilities to increase modularity through individual node movements.     Figure A4. Criteria for optimal number of topics k=7 is optimal (women up, men bottom)

Statistical comparisons
We then statistically compared (t test) the means of the scores obtained in each pair of topics, considering pairs placed in the same position of Figure 6 (i.e. the pair Topic 19 and Topic 1 because they are both at the extremes of Figure 6, then Topic 17 with Topic 6, etc). All comparisons are shown in the figure below. Figure 8A. Comparisons of happiness scores for corresponding topics

Plutchik model
Positive and negative valences are associated to emotions based on Plutchik's Psychoevolutionary Model of the Primary Emotions. (TenHouten W.D. 2017). Plutchik, developed a psychoevolutionary model in which he identified four life problems: identity, temporality, hierarchy, and territoriality. Each of these life problems can be either an opportunity or a danger, so that a situation is either negatively or positively valenced. Any of the eight problem-valence situations can therefore occur, each of which triggers a distinct subjective state of mind which activates an adaptive reaction. These eight prototypical adaptive reactions, constitute the 8 primary emotions, associated with positive and negative valences as shown in the It might seem counterintuitive to consider anger a positive emotion, and indeed several researchers have assigned it a negative valence because it is associated with unpleasant eliciting situations (Ekman and Davidson 1994;Ekman and Friesen 1975) and occurs in situations that are incongruent with one's goals (Lazarus 1991). But anger is positive in that it is an emotion that evokes behavioral tendencies of approach and is associated with attack behavior, functioning to move aside, or destroy or move aside, an obstacle standing in the way of a goal, which is an assertive course of action (Darwin 1872;Plutchik 1980). Fear is a self-protective response of moving away from a dangerous or problematic person or situation, which is clearly of negative valence insofar as it is both unpleasant and avoidant motivated.