Low Testosterone on Social Media: Application of Natural Language Processing to Understand Patients’ Perceptions of Hypogonadism and Its Treatment

Background Despite the results of the Testosterone Trials, physicians remain uncomfortable treating men with hypogonadism. Discouraged, men increasingly turn to social media to discuss medical concerns. Objective The goal of the research was to apply natural language processing (NLP) techniques to social media posts for identification of themes of discussion regarding low testosterone and testosterone replacement therapy (TRT) in order to inform how physicians may better evaluate and counsel patients. Methods We retrospectively extracted posts from the Reddit community r/Testosterone from December 2015 through May 2019. We applied an NLP technique called the meaning extraction method with principal component analysis (MEM/PCA) to computationally derive discussion themes. We then performed a prospective analysis of Twitter data (tweets) that contained the terms low testosterone, low T, and testosterone replacement from June through September 2019. Results A total of 199,335 Reddit posts and 6659 tweets were analyzed. MEM/PCA revealed dominant themes of discussion: symptoms of hypogonadism, seeing a doctor, results of laboratory tests, derogatory comments and insults, TRT medications, and cardiovascular risk. More than 25% of Reddit posts contained the term doctor, and more than 5% urologist. Conclusions This study represents the first NLP evaluation of the social media landscape surrounding hypogonadism and TRT. Although physicians traditionally limit their practices to within their clinic walls, the ubiquity of social media demands that physicians understand what patients discuss online. Physicians may do well to bring up online discussions during clinic consultations for low testosterone to pull back the curtain and dispel myths.


Introduction
The Testosterone Trials were a coordinated series of placebo-controlled, double-blinded trials intended to elucidate risks and benefits of testosterone replacement therapy (TRT) in hypogonadal men [1][2][3][4][5][6][7]. Despite these recent trials, clinicians continue to be uncomfortable treating these men, in part due to unanswered questions related to cardiovascular outcomes and cancer risk, as well as how TRT is portrayed in popular culture. Perhaps discouraged by conflicting information from physicians and traditional media, patients sometimes turn to social media platforms to discuss medical concerns with peers [8,9].
Interactive social media channels have emerged as potent resources for individuals to discuss health care concerns [9]. Reddit, an anonymous discussion platform with over 330 million monthly active users, serves as a popular internet destination for discussions of health-related topics [10]. The Reddit forum or subreddit r/Testosterone [11], which boasts over 30,000 active members, is devoted to answering questions, sharing personal accounts, and disseminating resources related to TRT and testosterone levels. Similar discussions occur on other social media sites, including Twitter, a microblogging platform with over 126 million daily active users [12].
We hypothesized that the content of online discussions about low testosterone can be classified into themes that may inform how physicians evaluate, counsel, and treat men with hypogonadism. Here, we apply quantitative natural language processing (NLP) techniques to identify dominant themes of discussions regarding low testosterone and TRT on social media.

Study Design and Sources of Data
An overview of our methodology is presented in Figure 1. The study comprised three phases: extraction of data from social media platforms ( Figure 1A), automated organization of textual data ( Figure 1B), and quantitative analysis of the textual data to identify dominant themes of the text ( Figure 1C).
First, we retrospectively processed posts and comments from the Reddit community r/Testosterone from December 2015 through May 2019. Reddit data were extracted using BigQuery (Google LLC), an enterprise data analytics platform, from a dataset uploaded for public use [13] ( Figure 1A). We evaluated both parent posts (the main post in a Reddit discussion) and comment posts (submitted in response to a parent post). We applied a word count criterion of >20 words for parent posts to exclude potential spam, deleted text, and posts composed only of links to other websites. As we anticipated the average word count of comment posts to be less, we used a more relaxed word count criterion of >5 words for comment posts.
Next, Twitter data (tweets) were collected prospectively from June through September 2019 using the rtweet application [14], which integrates tweets for processing in RStudio version 1.1.463 (RStudio PBC) ( Figure 1A). We extracted tweets containing the terms low testosterone, low T, and testosterone replacement. We applied a word count criterion for tweets (>5 words per tweet), given the character count limitation imposed by the Twitter platform. Retweets (reposts of an identical, previously published tweet) were excluded from analysis. Overview of methods: (A) extraction of Reddit and Twitter data using BigQuery and rtweet, respectively; (B) processing of raw text data using the meaning extraction helper to generate a binary text matrix for each data set; (C) meaning extraction method with principal component analysis generates word clusters for each dataset. Rotated component plots are shown with x-, y-, and z-axes representing the three clusters that capture the greatest variance of the data. MEH: meaning extraction helper; MEM: meaning extraction method; PCA: principal component analysis.

Natural Language Processing Using the Meaning Extraction Method
Reddit parent posts, Reddit comment posts, and tweets from Twitter were separately subjected to an NLP technique called the meaning extraction method (MEM) [15] with principal component analysis (PCA). MEM/PCA tracks words that cluster together to derive themes quantitatively [15]. This approach has been previously validated to reveal information about individuals' personalities, communication strategies, and behaviors [16,17].
To automate the MEM, we used the topic modeling application meaning extraction helper version 2 [18] to deconstruct each post or tweet into its component words. Stop words (eg, articles, prepositions, and transitions) were filtered out. Remaining words were ranked by their frequencies of appearance in each post or tweet ( Figure 1B). Words were then subjected to PCA with varimax rotation ( Figure 1C) using SPSS Statistics version 25 (IBM Corporation). PCA identified clusters of words that frequently appeared together. Each word was conferred a factor loading, the correlation coefficient between the word and the cluster to which it belonged. Factor loading thresholds of >0.20 are appropriate when performing PCA of text data to capture a sufficient proportion of the variance in the data [19,20]. We assigned a descriptive theme to each cluster based on the words within it.

Subset Analyses on Key Topics of Interest
Given widespread interest and controversy regarding the potential associations of TRT with cardiovascular disease and prostate cancer risk, we sought to quantitate the appearance of these topics on Reddit and Twitter. Subset analysis was performed to determine the frequencies of the words prostate, cancer, PSA (prostate-specific antigen), heart, attack, stroke, cardiovascular, and death. Furthermore, to identify the degree to which individuals allude to seeking consultation with a health care provider, an additional analysis was performed to determine the frequencies of the relevant terms doctor, urologist, endocrinologist, and appointment.

Statistical Validity of Principal Component Analysis
To assess applicability of PCA to each dataset, the Kaiser-Meyer-Olkin (KMO) statistic, a measure of sampling adequacy (values >0.60 are adequate), and the Bartlett test for sphericity, which tests if there are significant correlations among variables of interest, were calculated [21].

Ethics
Consistent with previous investigations on social media data, this work was exempted by the institutional review board of the University of California, Los Angeles, as it involves publicly available data and does not involve human subjects.

Total Number of Posts Extracted From Social Media
From the r/Testosterone community on Reddit, we retrospectively extracted 19,083 parent posts and 218,082 comment posts over the 42-month period of study. After exclusions, 12,665 parent posts and 186,670 comment posts remained. From Twitter, we prospectively extracted 7467 tweets over 4 months; 6659 tweets remained after exclusions.

Natural Language Processing of Reddit Data
Using MEM for Reddit parent post and comment post data, we identified 5 factors, or thematic word clusters, that included words with factor loadings greater than 0.30 and 0.20, respectively (Tables 1 and 2).
The following themes emerged from NLP of Reddit data: seeing a doctor, results of laboratory tests, administration of TRT, and lifestyle interventions (both parent posts and comment posts); symptoms of hypogonadism (parent posts only); and TRT medications (comment posts only). Table 3 contains representative quotations that feature each Reddit theme. Some quotes have been abridged in the interest of space.  Table 3. Representative quotations for each theme derived from the meaning extraction method. Asterisks are part of the quotations and do not refer to anything in the table.

Results of laboratory tests
Here's what came up: shbg and dhea still pending. I had to get these results because i have an appointment with neurosurgeon soon and he will need the labs and mri e .

Lifestyle interventions
Have been eating super clean. Working with a dietitian/personal trainer. Was dieting mostly high protein / low fat / low carb I work out all the time lifting heavy weights, 3 or 4 times a week on average. I eat a good diet, take my zinc, vitamin D, and get in my fats and essential fats.

Seeing a doctor
I know several people on trt f , but they all have the same doc...you walk in, tell him you want to get bigger, stronger, and faster, pay out of pocket for his blood test then buy your meds from his attached pharmacy. That's not what I want. I want to find out what's wrong without a preconceived bias.
So I go to the appointment. And the specialist I saw (a urologist) said he wasn't the guy to see about this issue, and ended up referring me to another specialist. I literally can't hold out a week to get another blood test and also I can't afford it right now.

Symptoms of hypogonadism
All the normal symptoms: brain fog, mood swings, low libido, erectile dysfunction, inability to add muscle at the gym despite working out 3x a week.
Symptoms: brain fog, very low energy level, lifelessness-zombie feeling most days, very lethargic, mood swings, easy to get angry, grumpy and annoyed at earliest, no libido/sex drive, ED l -less frequency, less powerful, minimal to no erections during sex, softer (haven't had sex in years)

Seeing a doctor
Many doctors-especially PCPs m -are not fluent in the endocrine system. They aren't supposed to be. Going to your primary care physician for hormone questions is a mistake. If you knew you had heart issues, wouldn't you go to a cardiologist?
My PCP looked super confused and clueless as to what he was supposed to do for me. Doc made me do two more labs fasting to confirm then he referred me out to an endocrinologist. The endo made me do three more fasting labs and a testicular ultrasound to confirm. 198 is low as hell for your dad, and even 450 for him would be low. Yours is lowish, but you have definite symptoms.

Testosterone replacement therapy administration
75 mg E5D n (105 mg per week). Doesn't require an AI, doesn't give me side effects. I am at ~700 on trough days and feel pretty damn good. I had just moved to a standard TRT dose of Test Cyp, 100 mg/week. At 5'11", 172 lbs, and 17% body fat, taking 1 mg of Arimidex every day tanked my E2. Dropping down to 0.25 mg Arimidex once a week had the same effect.

Lifestyle interventions
Eat good food, lift heavy, and get sleep. Repeat for two years.
TRT will not turn you into a bodybuilder. It may tone you a little bit (if everything is in check). But just saying "I eat good" literally means nothing. What are your macros? What's your diet? Etc?

Testosterone replacement therapy medications
HCG is a water-based peptide hormone that can be injected to replace the lost LH hormone that TRT shuts down. Without hCG, the LH receptors in the testes are no longer getting activated. The results: the testes shrink.
Clomiphene. What a double-edged sword. First, Clomid will certainly have an effect on your testosterone levels. Usually, it is doses substantially higher than 12.5 mgs daily.

Symptoms of hypogonadism
Keeping your hormone levels up is a crucial part of #health. Low T can lead to all types of adverse effects: -weight gain/belly fat -#LowEnergy -low sex drive This, in turn, causes a lower sex drive, depression, reduced muscle mass, and low levels of energy. Erectile dysfunction is another symptom.

Cardiovascular risk
#Testosterone Replacement Therapy Lowers Heart Attack Risk Aging men with low testosterone levels who take testosterone replacement therapy (TRT) are at a slightly greater risk of experiencing an ischemic stroke

Symptom improvement
Starting testosterone replacement therapy and thyroid medication at the same time is quite the 1-2 punch to the system. Endless energy, great sleep, and able to lift weights heavier and longer. "My energy is back": how testosterone replacement therapy is changing men's lives

Derogatory comments and insults
That little cuck should be the poster boy for low T supplements

Natural Language Processing of Twitter Data
Similarly, MEM for Twitter data identified 4 factors, or thematic word clusters, with factor loadings greater than 0.25 ( Table 4). The following themes emerged from NLP of tweets: symptoms of hypogonadism, cardiovascular risk, symptom improvement, and derogatory comments and insults.
The highest frequency word occurrences among tweets as determined by PCA were level (693/6659, 10.40%), male (426/6659, 6.40%), sex (213/6659, 3.20%), and increase (200/6659, 3.00%). Twitter PCA accounted for 9.01% (600/6659) of the total variance. KMO statistic was 0.61 for Twitter data, with Bartlett test <0.01, indicating that the Twitter data were appropriate for factor analysis using PCA. Of note, other studies using MEM/PCA have reported similar percentages of variance as those determined in our analysis of Reddit and Twitter data [22,23]. Table 4. Thematic clusters, word frequencies, and associated factor loading coefficients derived from the meaning extraction method with principal component analysis of tweets about low testosterone, low T, or testosterone replacement on Twitter (n=6659).

Word Occurrences on Key Topics of Interest
Subset analysis was performed to determine word occurrence frequencies in three key topics of interest that relate to TRT: prostate cancer risk, cardiovascular disease risk, and seeking consultation with a health care professional. These data are presented in Table 5.

Principal Findings
NLP techniques applied to unfiltered discussions on Reddit and Twitter offer a useful framework for understanding patient priorities outside the doctor's office. We found that men largely turn to social media to learn about symptoms of low testosterone, interpretation of personal lab results, practicalities of TRT, and body changes with treatment. Notably, cardiovascular risk was a major discussion theme, echoing concerns among prescribers, who may be deterred by continued ambiguity despite the publication of the Testosterone Trials. Although NLP analysis did not reveal prostate cancer as a notable theme, a number of posts included text related to this topic, suggesting that this may represent an important discussion point for a subset of online discussions related to TRT.
Our results underscore that patients are searching for medical guidance related to hypogonadism on social media, an environment where anecdotes predominate and advertising often masquerades as medical advice [24]. TRT prescriptions have risen almost 4-fold over the last two decades, which can be attributed, in part, to off-label indications and direct-to-consumer advertising [25]. Even beyond standard TRT, testosterone-boosting supplements with minimal data to support their efficacy are aggressively marketed and readily available online [26]. But still, social media represents an enormous opportunity for the medical community to improve how we engage with our patients and to do so in a meaningful and impactful way. Potential interventions that may inoculate against coercive direct-to-consumer marketing practices include disseminating high-quality, open-access information related to hypogonadism. For example, Halpern et al [27] recently published a JAMA Patient Page article on hypogonadism. This single-page handout written in easily accessible language includes an infographic highlighting symptoms of hypogonadism and potential adverse effects of TRT, in addition to information related to etiology of hypogonadism and a discussion of potential cardiovascular and prostate cancer risks associated with TRT-all topics that emerged as major themes of discussion from our data.
Social media platforms, including Reddit and Twitter, create a space for patients not only to obtain answers to questions that they are either uncomfortable or unwilling to ask in a face-to-face clinical setting but also to connect with others going through similar experiences. However, not all health-related discussions online are productive. Twitter featured the theme of derogatory comments and insults, highlighting an undertone of stigma, which may compound existing barriers preventing men from accessing care [28]. In contrast, the seeing a doctor theme only emerged on Reddit, with more than 25% of parent posts mentioning the word doctor, compared with less than 2% on Twitter. This may reflect inherent differences among the two social media platforms, as Twitter is constrained by a strict character count limitation and is overall less anonymous, with discussants frequently using their true identities in their display usernames and account photos.
Although clinician engagement with the online hypogonadism community will become increasingly important in the coming years, improving the in-office clinical experience of our patients cannot be overemphasized. Our data reveal that many of the online discussions featured personal questions related to interpretation of lab results. This is consistent with a previous study exploring Reddit discussions of male factor infertility, where nearly 20% of all posts featured a question related to personal semen analysis results [29]. Such discussions related to lab results cannot be addressed by disseminating a primer on hypogonadism and TRT, but instead demand the expertise of a clinician trained in managing male endocrinology and the related sexual, reproductive, and psychological comorbidities. Creating an in-office experience where men feel comfortable and safe to ask their questions and voice their concerns should be a priority for any outpatient clinical setting, but especially one that caters to men with suspected hypogonadism. Both outpatient primary care settings and urological outpatient clinics can learn from the success of the emerging multidisciplinary men's health clinic [30].
Here we offer valuable insight into primarily patient concerns in a forum that allows for honest and unfiltered patient feedback as it relates to these discussants' experiences with hypogonadism. Clinically, these data highlight that patients worry most about comorbidities, lifestyle factors impacted by low testosterone, and treatment options. While other aspects of hypogonadism can be discussed, these data highlight the most salient hypogonadism-related concerns for our patients. Additionally, this study can further improve on patients' in-office experiences by informing how physicians can lead discussions to highlight aspects of low testosterone that patients may feel are not being adequately addressed.

Limitations
Our study is not without limitations. Although NLP techniques allowed us to analyze a large volume of discrete social media posts, generalizability of MEM is limited by the absence of contextual valence (positivity or negativity). However, this does not impair overall thematic identification. Additionally, discussants who turn to social media for health care information may be different with respect to demographics, health care priorities, and information preferences compared with those who do not; our results should therefore be interpreted within this context [31]. It should also be noted that some individuals use social media as a platform to vent about their experiences with health care professionals as they relate to hypogonadism care. This is an important distinction to make because it may not necessarily represent a lack of communication between patients and their physician but rather a discussant's opportunity to share. Future studies may consider investigating to other Reddit communities, expanding Twitter search terms, or exploring other social media platforms.

Conclusions
This study represents the first evaluation of the social media landscape surrounding hypogonadism and TRT using NLP techniques. Our analysis of more than 200,000 discrete social media posts revealed dominant themes of discussion, which may inform how physicians evaluate and counsel men with hypogonadism. Understanding the complex internet landscape of hypogonadism discussions represents the first step in creating well-informed and clinically meaningful change. Although physicians traditionally limit their practices to within their clinic walls, the ubiquity of social media demands that physicians engage patients where they are, including online. Practicing physicians may do well to bring up online discussions during clinic consultations, to pull back the curtain and dispel myths.