Background

JMIR

J Med Internet Res

Journal of Medical Internet Research

1438-8871

JMIR Publications

Toronto, Canada

v23i12e26093

36260398

10.2196/26093

Original Paper

Investigating Individuals’ Perceptions Regarding the Context Around the Low Back Pain Experience: Topic Modeling Analysis of Twitter Data

Office

JMIR Publications Editorial

Chirambo

Griphin

O'Hagan

Edel

Robert 1 Delir Haghighi

Pari

PhD 1

Department of Human-Centred Computing Faculty of Information Technology Monash University

Building H, Level 6

900 Dandenong Road

Caulfield East, 3145

Australia 61 99032355 pari.delirhaghighi@monash.edu

https://orcid.org/0000-0001-9922-1214

Burstein

Frada

PhD 1

https://orcid.org/0000-0001-8258-0878

Urquhart

Donna

PhD 2

https://orcid.org/0000-0001-5239-2593

Cicuttini

Flavia

PhD, MD 2

https://orcid.org/0000-0002-8200-1618

1 Department of Human-Centred Computing Faculty of Information Technology Monash University

Caulfield East

Australia 2 Department of Epidemiology and Preventive Medicine School of Public Health and Preventive Medicine Monash University

Melbourne

Australia

Corresponding Author: Pari Delir Haghighi pari.delirhaghighi@monash.edu

12 2021

23 12 2021

23 12

e26093

27 11 2020 22 1 2021 6 3 2021 21 11 2021

2021

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Background

Low back pain (LBP) remains the leading cause of disability worldwide. A better understanding of the beliefs regarding LBP and impact of LBP on the individual is important in order to improve outcomes. Although personal experiences of LBP have traditionally been explored through qualitative studies, social media allows access to data from a large, heterogonous, and geographically distributed population, which is not possible using traditional qualitative or quantitative methods. As data on social media sites are collected in an unsolicited manner, individuals are more likely to express their views and emotions freely and in an unconstrained manner as compared to traditional data collection methods. Thus, content analysis of social media provides a novel approach to understanding how problems such as LBP are perceived by those who experience it and its impact.

Objective

The objective of this study was to identify contextual variables of the LBP experience from a first-person perspective to provide insights into individuals’ beliefs and perceptions.

Methods

We analyzed 896,867 cleaned tweets about LBP between January 1, 2014, and December 31, 2018. We tested and compared latent Dirichlet allocation (LDA), Dirichlet multinomial mixture (DMM), GPU-DMM, biterm topic model, and nonnegative matrix factorization for identifying topics associated with tweets. A coherence score was determined to identify the best model. Two domain experts independently performed qualitative content analysis of the topics with the strongest coherence score and grouped them into contextual categories. The experts met and reconciled any differences and developed the final labels.

Results

LDA outperformed all other algorithms, resulting in the highest coherence score. The best model was LDA with 60 topics, with a coherence score of 0.562. The 60 topics were grouped into 19 contextual categories. “Emotion and beliefs” had the largest proportion of total tweets (157,563/896,867, 17.6%), followed by “physical activity” (124,251/896,867, 13.85%) and “daily life” (80,730/896,867, 9%), while “food and drink,” “weather,” and “not being understood” had the smallest proportions (11,551/896,867, 1.29%; 10,109/896,867, 1.13%; and 9180/896,867, 1.02%, respectively). Of the 11 topics within “emotion and beliefs,” 113,562/157,563 (72%) had negative sentiment.

Conclusions

The content analysis of tweets in the area of LBP identified common themes that are consistent with findings from conventional qualitative studies but provide a more granular view of individuals’ perspectives related to LBP. This understanding has the potential to assist with developing more effective and personalized models of care to improve outcomes in those with LBP.

low back pain Twitter content analysis social media topic modeling patient-centered approach pain experience context of pain

Introduction

Low back pain (LBP) is the leading cause of disability worldwide [1,2]. Approximately 50%-80% of adults experience LBP at least once in their lives [3] and it is a leading cause of work absence and limits physical activities, posing a large economic burden [1,4]. In the United States, the total cost associated with LBP exceeds US $100 billion per year [5,6]. It is also a significant contributor to the current global epidemic of narcotic prescriptions [7].

Optimizing management of conditions such as LBP requires consumers to be engaged in their care. To enable this, health care providers need to have an understanding of the full context of the condition from the consumer perspective. “Contextual variables” here refer to any type of useful information about the context of an individual’s pain experience, such as physical, emotional, social, and/or occupational variables [8]. A better understanding of the contextual variables of individuals with LBP could provide clinicians and health providers with an alternative insight into patients’ concerns, beliefs, and expectations, and has the potential to improve outcomes in LBP [9]. Although there have been many studies examining individuals’ beliefs about LBP, patients’ perspectives remain inadequately understood [10]. Although qualitative studies—including systematic scoping reviews—investigating patients' needs and expectations have been conducted, these have largely focused on a single topic, such as health care, with the findings extrapolated from heterogeneous studies that are of poor quality [11-13]. A further limitation of current approaches is that most traditional data collection methods use predefined frameworks that have the potential to constrain responses. For instance, validated questionnaires that provide statements about back pain and its consequences (such as “back pain must be rested”) and require the respondent to indicate their level of agreement on a scale are commonly used [12,13]. Moreover, for logistical and methodological reasons, many studies restrict the selection of populations to be studied.

With the current advances in online and web technologies, social media has emerged as a new and rich source of first-person health care data [14-16]. Social media platforms provide an opportunity to rapidly collect data from a larger and more diverse population in a cost-efficient manner. Health-related topics are commonly discussed on Twitter [17-19], a microblogging social media site [20]. A systematic review conducted by Sinnenberg et al [21] found six main uses of Twitter in health research: content analysis, surveillance, engagement, recruitment, intervention, and network analysis. Aggregation and analysis of large volumes of health-related data from social media sites could provide valuable information from a first-person point of view [14,22]. In the area of LBP, this approach could be used to investigate individuals’ perspectives and the context around the LBP experience [15,23]. We hypothesize that the detected topics identify specific contexts around the LBP experience in individuals. Thus, the aim of this study was to identify contextual variables of the LBP experience from a first-person perspective using a topic modeling approach of Twitter data to provide useful insights into individuals’ beliefs and perceptions. This has the potential to inform more effective patient-centered approaches to the management of LBP.

Methods Study Approach

Our study approach was to undertake content analysis of Twitter data by applying topic modeling. Content analysis is a widely used technique for qualitative research [24] that enables studying patient experience in depth by deriving topics of interest from text documents [14,25].

Twitter Data

Twitter was used as the data source rather than other social media platforms, blog posts, or news articles because individuals use this platform for expressing and sharing their feelings and opinions on health-related topics by posting short messages that can be easily collected through application programming interfaces (APIs) or other open sources [14-17,26]. We used an open-source Twitter scraping tool called Twint [27] for collecting tweets related to LBP that were written in English. Twint enables the collecting of Twitter data without using Twitter's API through its publicly available library in the Python programming language [27,28]. We collected tweets posted between January 1, 2014, and December 31, 2018 (inclusive). The time frame of 5 years was selected to provide us with sufficient data to examine the patterns in emerging topics and the number of tweets over time. Since the number of active users on the social media platform increased in recent years and we needed a large volume of data for topic modeling, we did not consider tweets posted before 2014. We selected the search keywords based on 3 studies on back pain [15,29,30]. These are detailed in Table 1. Search keywords were verified by our domain experts (FC, a rheumatologist; DU, a physiotherapist) who have extensive research and clinical expertise in the area of LBP. Selecting search keywords and an appropriate time frame are important considerations in the data collection process. The Monash University Human Research Ethics Committee approved this study (project ID 19738).

Our data processing and analysis consisted of 4 steps (see Figure 1).

Table 1

Keywords used to search tweets related to low back pain.

Source	Study purpose	Keywords	Total, n
Lee et al, 2016 [15]	To quantify the risks associated with a new tweet about back pain	“painful back,” “sore back,” “back started hurting,” “buggered my back,” “hurt my back,” “I’ve got backache,” “injured my back,” “my back hurts,” “I’ve got back pain,” “pain in my back,” “put my back out,” “my back is killing me”	12
Ahlwardt et al, 2014 [30]	To compare self-reported toothache experiences in tweets with those of backache, earache, and headache	“backache,” “back ache,” “back aches,” “back hurt,” “back hurting,” “back hurts,” “back killin’,” “back killing,” “back pain,” “back sore”	10
Campbell et al, 2013 [29]	A systematic review to study the influence of employment social support in nonspecific back pain	“lumbago,” “backache,” “back ache,” “back pain,” “low back ache,” “low back pain,” “lower back pains”	7

Figure 1

The overall data analysis workflow. The analysis consists of four steps: (1) data preprocessing, (2) thematic analysis using topic modelling, (3) topic labeling and categorization, and (4) domain expert validation. BTM: biterm topic model; DMM: Dirichlet multinomial mixture; GPU-DMM: General Pólya Urn Dirichlet Multinomial Mixture; LDA: latent Dirichlet allocation; NMF: nonnegative matrix factorization.

Step 1: Data Preprocessing

We removed duplicates, retweets, URLs, and tweets related to marketing and advertisements, which reduced the data set from 7,892,210 to 2,825,645. We filtered the data further by removing tweets that did not contain first person pronouns [15]. As a result, the remaining data set size was 2,010,295.

We replaced contractions with their expanded forms (eg, “didn’t” to “did not”). We converted the HTML characters to ASCII characters and removed hashtags, Unicode strings (eg, “\u2026”), numbers, and punctuation. We replaced abbreviations, elongated words (eg, “gooood” to “good”), and emoticons and emojis with their equivalent English expressions. We then performed spelling correction, lowercasing, tokenization, and lemmatization, created n-grams, removed stop words (eg, common terms such as “the” and “is”). We again removed the duplicates and the remaining data set was 1,249,576 tweets.

After completing the abovementioned steps, we excluded tweets with less than three words because in topic modeling, the document size is important to achieve high accuracy [31]. This reduced the data set to 896,867 tweets.

Step 2: Topic Modeling

Topic modeling is a technique used to provide a summary of a large collection of documents by extracting “topics” that represent the dominant themes [32]. It allows the uncovering of common, hidden themes from a corpus of text documents such as tweets. We tested 5 well-established topic modeling algorithms for detecting topics in a text-based corpus, namely latent Dirichlet allocation (LDA) [33], Dirichlet multinomial mixture (DMM) [34], GPU-DMM [35], biterm topic model (BTM) [36], and nonnegative matrix factorization (NMF) [37].

LDA is a generative probabilistic model that assumes each document can be represented by distribution over topics and each topic by distribution over words [33,38]. DMM is also a generative model but it assumes that each document is associated with one single topic [34,39]. GPU-DMM is an extended method of DMM that considers semantic similarity between words to provide semantic understanding of text documents and improve topic inference [35,40]. BTM uncovers topics by modeling the word co-occurrence patterns (ie, biterms) rather than using the document-level word co-occurrences [36,41]. NMF is able to learn the latent features in data using a nonnegative representation and improve latent semantic topic identification [37,42,43].

To use these models (except for NMF), we used a Java-based open-source library for short text topic modeling algorithms called STTM (version 1.8) [44], whereas for NMF we used the sklearn [45] library. For each approach, we performed a series of experiments ranging from 5 topics to 200 topics. We applied the 5 algorithms to the 896,867 tweets to determine the best model and the optimal number of topics.

Choosing the right number of topics is a crucial step in topic modeling because it can affect the accuracy of results. The quantitative approach computes the coherence score and perplexity, which helps in determining the optimal number of topics [46]. The coherence score measures the sum of the pairwise word-similarity scores of the words in the topic, using the pointwise mutual information (PMI) score [47]. Best collocation pairs usually have a high PMI. On the other hand, the qualitative approach requires humans and domain experts to examine the topics. Human judgment is extremely important because topic modeling uses a form of unsupervised learning.

As a quantitative approach, we calculated the coherence score of each model on different numbers of topics ranging from 5 to 200, based on the PMI score [47,48]. The coherence score was used to evaluate the quality of the topic-word distribution. LDA outperformed other approaches (ie, DMM, GPU-DMM, BTM, and NMF).

Additionally, we used a qualitative approach to select the most representative topics. We manually examined the topics, their top 20 terms, and a random sample of tweets in each topic. We also created a word cloud for each topic and evaluated word clouds and their sample tweets. We identified the number of topics that provided us with distinct and meaningful topics; if we exceeded this number of topics, we started to notice an increase in duplicates and overlapping topics. We used both quantitative and qualitative approaches to select the optimal number of topics.

Step 3: Topic Labeling and Categorization

Topic labeling is a process of representing the meaning of a topic by assigning each topic a descriptive word or phrase [49]. Although automatic labeling approaches can reduce costs and time required, they are not able to achieve high semantic validity and accuracy [50,51]. In our study, we used the “eyeballing” method, which refers to reading and inspecting the top words in a topic and manually assigning a label [50]. We made sure that the results met the requirements of a “good” label: (1) semantically relevant, (2) meaningful, (3) representative, (4) adequate, and (5) understandable [34,49].

LDA assumes that each document (tweet) is a mix of topics with different proportions [33]. We were interested to examine tweets based on their dominant topic to gain a better understanding of the frequency of topics across all tweets. Therefore, we performed further analysis, and used the label of the dominant topic to represent each tweet, and then calculated the total number of tweets per topic.

To improve the results of thematic analysis, low-order topics can be grouped under broad, higher-order categories [52]. Higher-level categories can provide a better overview of the key topics discussed by individuals. To this end, after manual topic labeling, we performed topic categorization and assigned a category label to the topics that represented common themes. To identify the important and widely discussed categories, we then calculated the percentage of all tweets that corresponded to each individual category.

Step 4: Domain Expert Validation

Two domain experts (FC, a rheumatologist; DU, a physiotherapist), actively working clinically and researchers in the area of LBP, independently examined the selected topics from the previous step where each topic included the top 20 words to determine face validity. As previously described, in topic modeling, the top words of each topic provide the description of that topic, thereby assisting the domain experts with inferring its meaning [49]. The experts then met to reconcile any differences and develop the final labels.

Results Overview

The total number of collected tweets about LBP was 7,892,210 from 2,420,258 unique users from 2014 to 2018. The average number of words in each tweet increased from 2017 onward (Multimedia Appendix 1), in line with Twitter doubling the character limit of tweets from 140 to 280 characters as of November 2017 [53].

Step 1: Data Preprocessing

After performing comprehensive data preprocessing, the final number of retained tweets was 896,867, which represents 11% (896,867/7,892,210) of the original raw data we collected, with a vocabulary size of 29,539. The minimum length of tweets was 4 words and the maximum length was 20 words.

Step 2: Topic Modeling

After testing 5 topic modeling algorithms and the number of topics based on the coherence score and our manual examination, we selected the best model that included 60 topics, detected from 896,867 self-reported tweets about LBP. Multimedia Appendix 2 shows the coherence score of different models with a different number of topics ranging from 5 to 200. The best model was the LDA model with 60 topics, which had a coherence score of 0.562. Multimedia Appendix 3 shows the best model selected with 60 topics and their top 20 terms.

Step 3: Topic Labeling and Categorization

The 60 topics were examined and manually given a topic label. The common and duplicate labels were then grouped into higher-order categories. Word clouds for the two categories of “pain regions” and “sleep” after combining the related topics are provided in Multimedia Appendix 4. The prevalence of the 60 manually labeled topics is presented in Multimedia Appendix 5.

Step 4: Domain Expert Validation

Independent examination of selected topics by two domain experts and reconciliation of any differences resulted in 19 contextual categories, with details presented in Multimedia Appendix 6. The total number of tweets within each of 19 contextual categories is presented in Figure 2, with more details in Multimedia Appendix 7. The “emotion and beliefs” category had the largest proportion of the total tweets, followed by “physical activity” and “daily life.” The lowest proportion of tweets belonged to the categories of “food and drink,” “weather,” and “not being understood.”

The proportion of tweets for each higher-level category over the years showed that all 19 categories had been discussed by individuals with relatively similar frequency every year (see Figure 3). However, the proportion of “emotion and beliefs” decreased from 2014 to 2018. The number of tweets about other categories, such as “aggravating factors” and “symptoms,” increased over that time period. An example of a tweet for each category is presented in Table 2 to illustrate the type of personal point of view related to each category.

Figure 2

The 19 categories and their proportions based on all tweets posted from 2014 to 2018.

Figure 3

The proportions of 19 categories based on the dominant topic per year.

Table 2

An example of tweets for each contextual category.

Categories	Examples of tweets
Emotion and beliefs	My back hurts, feeling sad because I wanna get up and do something ! I hate staying in bed :(
Physical activity	I did 6 miles on my exercise bike yesterday, felt really pleased with myself, and ate healthy. My back hurts today
Daily life symptoms	So my back hurts like hell and I can hardly sit here and do my hair. I hate it when my lower back hurts and sends shooting pains down my legs, making them ache and throb. Ugh.
Sleep	Every time I sleep in my sis guest bedroom my back hurts, that bed is not comfortable. I”d prolly be better off sleeping on the floor
Pain regions	today is not a good day. my back hurts, my shoulder hurts, my elbow is tingly, a little numb down to my hand and to top it off now my left knee hurts a little.
Health care	So I have found one good physio and one good chiropracter, both same price, who would you see if you had lower back pain?
Women	Being pregnant is literally taking everything out of me. I’m exhausted, my back is killing me and I stay moody…
Aggravating factors	Yesterday I tried doing a back flip on my trampoline. Now, every time I walk my back hurts. When I did the back flip I landed on my head.
Employment	Hurt my back at work yesterday and I’m working a full 12 hours tomorrow without getting paid. Lovin life right now.
Entertainment	Watching Cirque Du Soleil: Michael Jackson my back hurts just from watching it
Religion	Testimony Time! i want to give God the glory for healing me from a severe back pain
Co-occurring conditions	I don’t know if my back pain is causing depression or my depression is causing back pain…
Pharmacological therapies	I just took my very first Oxycodone for lower back pain. I think I’m in love. It didn’t just kill the pain. It assassinated it.
Self-treatments	Coconut oil epsom salt & vapor bath oil just soothed my back pain away
Social support	Told mom my back hurts she offered to rub my feet an back I have the best mom ever
Food and drink	my back is killing me cant get out ov bed but need coffee
Weather	I love cold weather but it’s really not helping with my back pain. Where is that warm summer weather attttttt.
Not being understood	OMG no one understands the pain I'm in right now. My back is killing me.

Discussion Principal Results

In this study, we identified 60 specific topics from 896,867 tweets about LBP and grouped them into 19 categories that relate to contextual variables of LBP. The top category was “emotion and beliefs,” with 157,563/896,867 tweets (17.6%), followed by “physical activity” (124,251/896,867, 13.85%) and “daily life” (80,730/896,867, 9%), while “food and drink,” “weather,” and “not being understood” had the lowest proportions of tweets (11,551/896,867, 1.29%; 10,109/896,867, 1.13%; and 9180/896,867, 1.02%, respectively). There were 11 topics within the category of “emotion and beliefs”; of 157,563 tweets in this category, 113,562 (72%) expressed negative sentiment. Our results were consistent with the general findings from traditional study methods in the area of LBP but provided more in-depth detail on the context of LBP from the individual perspective.

Comparison With Prior Work

Our study examined contextual variables to provide a novel insight into first-person perspectives of the LBP experience and confirmed the broad areas that have previously been identified using more traditional data collection methods from qualitative and quantitative studies. For example, psychosocial factors have an important role in LBP [54] and, from our analysis of tweets, “emotion and beliefs” was the most common topic we identified, with 157,563 of 896,867 tweets. This is consistent with LBP being widely recognized as a biopsychosocial condition, and growing evidence to show that psychological factors, such as beliefs and emotions, play an important role [55]. For instance, systematic reviews have highlighted that beliefs about back pain and negative consequences resulting from these beliefs are common across different countries and populations [56], and affect both treatment efficacy and prognosis [57]. Moreover, mass media campaigns that target negative beliefs have been implemented in an effort to influence how people manage their back pain on a population level [58]. Our study has also provided novel findings with respect to emotions. Although we found a range of emotions, from positive emotions (such as happy, love, or fun) to negative emotions (including hell, bad, or disgusting), the majority were found to be of negative affect. Although several studies have examined the role of specific emotions, such as anger [59,60], in LBP research, our understanding of the array of emotions experienced by individuals with back pain, specifically negative emotions, is limited.

Our study also highlighted areas related to the pain experience in individuals that have not been adequately explored in the literature but that play an important role in the effectiveness of LBP interventions and self-management behaviors, such as the “not being understood,” “religion,” and “food and drink” categories. We found that although the category of “not being understood” had the smallest proportion of tweets with a total of 9180 tweets, it had the top five words: “make,” “people,” ”stop,” “thing,” and “complain.” This is consistent with a previous systematic scoping review that examined what patients want from their medical care, which reported that patients felt misunderstood and wanted legitimation of their LBP [11]. Patients with LBP report negative social stereotyping from health care professionals, family and friends, and the community [61] and that they are dissatisfied with the inadequate advice they receive from medical practitioners and have identified an unmet need for care providers that show more understanding and empathy [11].

The category of “food and drink” is novel and interesting. The tweets included words relating to the type of food (eg, pizza, chocolate, cookies and cream), mealtimes (such as breakfast and lunch), and the process of bringing or making food. Although they reflect important daily habits of eating and drinking, they may also highlight issues around pain affecting an individual’s capacity to eat and drink and/or problems associated with weight and in particular obesity [62], which is a major public health issue [63].

There are well-described sex differences in the prevalence of back pain [64]. Analysis of tweets identified 3 topics under the “women” category including “motherhood,” “large breasts problem,” and “female health complaints.” LBP has been reported in more than two-thirds of pregnant women [65]. Improving psychological well-being, physical fitness, and general well-being may reduce LBP in women [65-67]. The topics identified in tweets may provide more direction in relation to the personal topics that warrant further examination (eg, the potential effect of “large breasts problem” and whether this is a cause of LBP or a potential confounding variable). Identifying possible mechanisms for the association with topics such as “motherhood” or “female health complaints” could also help with understanding whether these associations are due to psychosocial factors or biomechanical factors such as the lifting and carrying of children. Understanding the context of LBP could offer valuable insights into how people with LBP view and experience their condition; this could lead to the identification of new areas of research in exploring the causes of LBP, as well as the opportunity to identify areas of potential misinformation that need to be addressed.

Limitations

There are some limitations to our study. Although the keywords were taken from existing studies about LBP and approved by domain experts, some keywords, such as “back hurt” and “back pain,” were very broad. Therefore, the data collected might not have been specific to LBP. Selection of the right keywords in Twitter data analysis is very important to avoid unrelated data that could reduce the accuracy of results. Filtering and cleaning of Twitter data is also crucial for achieving high accuracy of results. In our study, we performed vigorous data cleaning, but our manual examination showed that there was a group of tweets that contained a few lines from the lyrics of a famous hip-hop song (Bad and Boujee) by Migos. These lines included “…So my money makin' my back ache.” One of our search keywords was “back ache.” Although there are many tools and methods available to automatically perform data cleaning, it is always necessary to manually inspect the results.

Twitter users tend to be younger and might not represent the general population; therefore, the results must be carefully interpreted [68]. Similar to other social media studies in health care, we cannot verify that individuals who tweeted about LBP were actually real patients [15]. However, the filtering based on first-person pronouns (eg, I, my, or mine) that we performed is likely to have reduced this.

To determine the optimal number of topics, we used the coherence score, a widely used method, and then manually examined and compared the models. This process can be further improved by using other measures such as heuristic approaches [69] or perplexity measures [70].

We also recognize that manual labeling of topics can be subjective. Two domain experts with extensive knowledge were involved in the labeling and examination of selected topics but future work in this area could involve a greater number of and more diverse domain experts to further reduce this subjectivity.

Conclusions

Our findings provided useful insights into individuals’ beliefs and perspectives regarding their needs and concerns related to LBP that complement the information available in the literature. Considering the contextual factors identified in this study rather than simply focusing on a biomedical model of LBP could address the needs of patients more holistically, help with improving LBP outcomes, and increase patient satisfaction. These findings have the potential to assist health care providers and clinicians with developing more effective, personalized therapies for LBP. There is also the potential to use social media to identify any major changes in community beliefs and needs regarding LBP that can be addressed in a timelier manner.

Multimedia Appendix 1

The average number of words in tweets per year.

Multimedia Appendix 2

Coherence score for latent Dirichlet allocation, Dirichlet multinomial mixture (DMM), General Pólya Urn Dirichlet Multinomial Mixture (GPU-DMM), biterm topic model, and nonnegative matrix factorization with number of topics 5-200 .

Multimedia Appendix 3

The best model selected with 60 topics and their top 20 terms.

Multimedia Appendix 4

Word clouds for the pain region and sleep categories.

Multimedia Appendix 5

Total number of tweets per each topic manually labelled.

Multimedia Appendix 6

The 19 contextual categories related to low back pain.

Multimedia Appendix 7

The total and percentage of tweets for each contextual category.

Abbreviations

API

application programming interface

BTM

biterm topic model

DMM

Dirichlet multinomial mixture

GPU-DMM

General Pólya Urn Dirichlet Multinomial Mixture

LBP

low back pain

LDA

latent Dirichlet allocation

NMF

nonnegative matrix factorization

PMI

pointwise mutual information

STTM

short text topic modeling algorithm

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. DU was supported by a National Health and Medical Research Council Career Development Fellowship (Level 2; 1142809).

PDH, FB, DU, and FC contributed to study concept and design. R contributed to data collection and topic modeling. PDH, DU, and FC contributed to topic labeling and clustering. PDH, FB, DU, and FC contributed to interpretation of data. R and PDH contributed to drafting of the initial manuscript. PDH, FB, DU, and FC contributed to critical revision of the manuscript for important intellectual content. R and PDH provided administrative, technical, or material support. All authors approved the final version of the manuscript.

None declared.

Hoy

March

Brooks

Blyth

Woolf

Bain

Williams

Smith

Vos

Barendregt

Murray

Burstein

Buchbinder

The global burden of low back pain: estimates from the Global Burden of Disease 2010 study

Ann Rheum Dis 2014 06 24 73 6 968 74

10.1136/annrheumdis-2013-204428

24665116

annrheumdis-2013-204428

Global Burden of Disease Study 2013 Collaborators

Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188 countries, 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013

Lancet 2015 08 22 386 9995 743 800

10.1016/S0140-6736(15)60692-4

26063472

S0140-6736(15)60692-4

PMC4561509

Rubin

Epidemiology and risk factors for spine pain

Neurol Clin 2007 05 25 2 353 71

10.1016/j.ncl.2007.01.004

17445733

S0733-8619(07)00005-9

Hoy

Brooks

Blyth

Buchbinder

The Epidemiology of low back pain

Best Pract Res Clin Rheumatol 2010 12 24 6 769 81

10.1016/j.berh.2010.10.002

21665125

S1521-6942(10)00088-4

Guo

Tanaka

Halperin

Cameron

Back pain prevalence in US industry and estimates of lost workdays

Am J Public Health 1999 07 89 7 1029 35

10.2105/ajph.89.7.1029

10394311

PMC1508850

Katz

Lumbar disc disorders and low-back pain: socioeconomic factors and consequences

J Bone Joint Surg Am 2006 04 88 Suppl 2 21 4

10.2106/JBJS.E.01273

16595438

88/1_suppl_2/21

Mafi

McCarthy

Davis

Landon

Worsening trends in the management and treatment of back pain

JAMA Intern Med 2013 09 23 173 17 1573 81

10.1001/jamainternmed.2013.8992

23896698

1722522

PMC4381435

Chou

Cicuttini

Urquhart

Anthony

Sullivan

Seneviwickrama

Briggs

Wluka

People with low back pain perceive needs for non-biomedical services in workplace, financial, social and household domains: a systematic review

J Physiother 2018 04 64 2 74 83

10.1016/j.jphys.2018.02.011

29574167

S1836-9553(18)30018-3

Lim

Chou

Seneviwickrama

KMD

Cicuttini

Briggs

Sullivan

Urquhart

Wluka

People with low back pain want clear, consistent and personalised information on prognosis, treatment options and self-management strategies: a systematic review

J Physiother 2019 07 65 3 124 135

10.1016/j.jphys.2019.05.010

31227280

S1836-9553(19)30057-8

De Souza

Frank

Subjective pain experience of people with chronic back pain

Physiother Res Int 2000 11 5 4 207 19

10.1002/pri.201

11129663

Chou

Ranger

Peiris

Cicuttini

Urquhart

Sullivan

Seneviwickrama

Briggs

Wluka

Patients' perceived needs of health care providers for low back pain management: a systematic scoping review

Spine J 2018 04 18 4 691 711

10.1016/j.spinee.2018.01.006

29373836

S1529-9430(18)30008-1

Symonds

Burton

Tillotson

Main

Absence resulting from low back trouble can be reduced by psychosocial intervention at the work place

Spine (Phila Pa 1976) 1995 12 15 20 24 2738 45

10.1097/00007632-199512150-00016

8747253

Waddell

Newton

Henderson

Somerville

Main

A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disability

Pain 1993 02 52 2 157 168

10.1016/0304-3959(93)90127-B

8455963

00006396-199302000-00005

Delir Haghighi

Kang

Buchbinder

Burstein

Whittle

Investigating Subjective Experience and the Influence of Weather Among Individuals With Fibromyalgia: A Content Analysis of Twitter

JMIR Public Health Surveill 2017 01 19 3 1 e4

10.2196/publichealth.6344

28104577

v3i1e4

PMC5290295

Lee

McAuley

Hübscher

Allen

Kamper

Moseley

Tweeting back: predicting new cases of back pain with mass social media data

J Am Med Inform Assoc 2016 05 23 3 644 8

10.1093/jamia/ocv168

26661720

ocv168

Asghar

Ahmad

Qasim

Zahra

Kundi

SentiHealth: creating health-related sentiment lexicon using hybrid approach

Springerplus 2016 5 1 1139

10.1186/s40064-016-2809-x

27504237

2809

PMC4954801

Raghupathi

Big data analytics in healthcare: promise and potential

Health Inf Sci Syst 2014 2 3

10.1186/2047-2501-2-3

25825667

PMC4341817

Bian

Topaloglu

Towards Large-scale Twitter Mining for Drug-related Adverse Events

SHB12 (2012) 2012 10 29 2012 25 32

10.1145/2389707.2389713

28967001

PMC5619871

Pershad

Hangge

Albadawi

Oklu

Social Medicine: Twitter in Healthcare

J Clin Med 2018 05 28 7 6 121

10.3390/jcm7060121

29843360

jcm7060121

PMC6025547

Aichner

Jacob

Measuring the Degree of Corporate Social Media Use

International Journal of Market Research 2015 03 01 57 2 257 276

10.2501/ijmr-2015-018

Sinnenberg

Buttenheim

Padrez

Mancheno

Ungar

Merchant

Twitter as a Tool for Health Research: A Systematic Review

Am J Public Health 2017 01 107 1 e1 e8

10.2105/AJPH.2016.303512

27854532

PMC5308155

Jayaraman

Forkan

ARM

Morshed

Haghighi

Kang

Healthcare 4.0: A review of frontiers in digital health

WIREs Data Mining Knowl Discov 2019 12 25 10 2 1 23

10.1002/widm.1350

Goh

Delir Haghighi

Burstein

Buchbinder

Developing a Contextual Model towards Understanding Low Back Pain

Proceedings of the 19th Pacific Asia Conference on Information Systems 2015

19th Pacific Asia Conference on Information Systems

July 5, 2015

Singapore

Hsieh

Shannon

Three approaches to qualitative content analysis

Qual Health Res 2005 11 15 9 1277 88

10.1177/1049732305276687

16204405

15/9/1277

Hewis

Do MRI Patients Tweet? Thematic Analysis of Patient Tweets About Their MRI Experience

J Med Imaging Radiat Sci 2015 12 46 4 396 402

10.1016/j.jmir.2015.08.003

31052120

S1939-8654(15)00254-4

Prier

Smith

Giraud-Carrier

Hanson

Salerno

Yang

Nau

Chai

Identifying Health-Related Topics on Twitter: An Exploration of Tobacco-Related Tweets as a Test Topic

Social Computing, Behavioral-Cultural Modeling and Prediction 2011 3 19

Berlin, Heidelberg

Springer

18 25

TWINT - Twitter Intelligence Tool

GitHub 2021-03-29

https://github.com/twintproject/twint

Xavier

Souza

Roesler

Barrére

Willrich

A Basic Approach for Extracting and Analyzing Data from Twitter

Special Topics in Multimedia, IoT and Web Technologies 2020 3 3

Cham

Springer

185 211

Campbell

Wynne-Jones

Muller

Dunn

The influence of employment social support for risk and prognosis in nonspecific back pain: a systematic review and critical synthesis

Int Arch Occup Environ Health 2013 02 9 86 2 119 37

10.1007/s00420-012-0804-2

22875173

PMC3555241

Ahlwardt

Heaivilin

Gibbs

Page

Gerbert

Tsoh

Tweeting about pain: comparing self-reported toothache experiences with those of backaches, earaches and headaches

J Am Dent Assoc 2014 07 145 7 737 43

10.14219/jada.2014.30

24982280

S0002-8177(14)60091-X

PMC4430082

Jian

Zhaoshi

Xuanlong

Qiaozhu

Ming

Understanding the limiting factors of topic modeling via posterior contraction analysis

2014 6 21

31st International Conference on International Conference on Machine Learning

2014

Beijing, China

190 198

Blei

Carin

Dunson

Probabilistic Topic Models: A focus on graphical model design and applications to document and image analysis

IEEE Signal Process Mag 2010 11 01 27 6 55 65

10.1109/MSP.2010.938079

25104898

PMC4122269

Blei

Jordan

Latent dirichllocation

Journal of Machine Learning Research 2003 3 993 1022

Nigam

Mccallum

Thrun

Mitchell

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning 2000 39 103 134

10.21236/ada350490

Wang

Zhang

Sun

Topic Modeling for Short Texts with Auxiliary Word Embeddings

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval 2016 7 17

SIGIR '16

July 17 - 21

Pisa, Italy

New York, NY

ACM Press

165 174

10.1145/2911451.2911499

Cheng

Yan

Lan

Guo

BTM: Topic Modeling over Short Texts

IEEE Trans Knowl Data Eng 2014 12 1 26 12 2928 2941

10.1109/tkde.2014.2313872

Lee

Seung

Algorithms for Non-negative Matrix Factorization

2000 1 1

13th International Conference on Neural Information Processing

2000

Denver, CO

MIT Press

535 541

Chandrasekaran

Mehta

Valkunde

Moustakas

Topics, Trends, and Sentiments of Tweets About the COVID-19 Pandemic: Temporal Infoveillance Study

J Med Internet Res 2020 10 23 22 10 e22624

10.2196/22624

33006937

v22i10e22624

PMC7588259

Surian

Nguyen

Kennedy

Johnson

Coiera

Dunn

Characterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection

J Med Internet Res 2016 18 8 e232

10.2196/jmir.6045

27573910

v18i8e232

Liang

Feng

Liu

Zhang

GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts

IEEE Access 2018 6 43612 43621

10.1109/access.2018.2863260

Mackey

Kalyanam

Klugman

Kuzmenko

Gupta

Solution to Detect, Classify, and Report Illicit Online Marketing and Sales of Controlled Substances via Twitter: Using Machine Learning and Web Forensics to Combat Digital Opioid Access

J Med Internet Res 2018 12 27 20 4 e10029

10.2196/10029

29613851

v20i4e10029

PMC5948414

Odlum

Yoon

Broadwell

Brewer

Kuang

How Twitter Can Support the HIV/AIDS Response to Achieve the 2030 Eradication Goal: In-Depth Thematic Analysis of World AIDS Day Tweets

JMIR Public Health Surveill 2018 11 22 4 4 e10262

10.2196/10262

30467102

v4i4e10262

PMC6284144

Wang

Zhang

Nonnegative Matrix Factorization: A Comprehensive Review

IEEE Trans Knowl Data Eng 2013 06 25 6 1336 1353

10.1109/tkde.2012.51

STTM: A Library of Short Text Topic Modeling

GitHub 2021-02-17

https://github.com/qiang2100/STTM

scikit-learn: Machine Learning in Python 2021-02-17

https://scikit-learn.org/stable/

Chang

Gerrish

Wang

Boyd-graber

Blei

Reading tea leaves: how humans interpret topic models

2009

22nd International Conference on Neural Information Processing Systems

December 7, 2009

Vancouver, British Columbia, Canada

288 296

Church

Hanks

Word association norms, mutual information, and lexicography

Computational Linguistics 1990 3 1 16 1 76 83

10.3115/981623.981633

Newman

Bonilla

Buntine

Improving topic coherence with regularized topic models

2011

24th International Conference on Neural Information Processing Systems

December 12, 2011

Granada, Spain

496 504

Allahyari

Pouriyeh

Kochut

Reza

A Knowledge-based Topic Modeling Approach for Automatic Topic Labeling

ijacsa 2017 8 9 335 349

10.14569/ijacsa.2017.080947

Morstatter

Liu

In Search of Coherence and Consensus: Measuring the Interpretability of Statistical Topics

Journal of Machine Learning Research 2017 18 1 6177 6208

Grimmer

Stewart

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Political Analysis 2017 01 04 21 3 267 297

10.1093/pan/mps028

Nowell

Norris

White

Moules

Thematic Analysis: Striving to Meet the Trustworthiness Criteria

International Journal of Qualitative Methods 2017 10 02 16 1 160940691773384

10.1177/1609406917733847

Boot

Tjong Kim Sang

Dijkstra

Zwaan

How character limit affects language usage in tweets

Palgrave Commun 2019 7 9 5 1 1 13

10.1057/s41599-019-0280-3

Pincus

Burton

Vogel

Field

A systematic review of psychological factors as predictors of chronicity/disability in prospective cohorts of low back pain

Spine (Phila Pa 1976) 2002 03 01 27 5 E109 20

10.1097/00007632-200203010-00017

11880847

Maher

Underwood

Buchbinder

Non-specific low back pain

Lancet 2017 02 18 389 10070 736 747

10.1016/S0140-6736(16)30970-9

27745712

S0140-6736(16)30970-9

Morton

de Bruin

Krajewska

Whibley

Macfarlane

Beliefs about back pain and pain management behaviours, and their associations in the general population: A systematic review

Eur J Pain 2019 01 07 23 1 15 30

10.1002/ejp.1285

29984553

PMC6492285

Wertli

Rasmussen-Barr

Held

Weiser

Bachmann

Brunner

Fear-avoidance beliefs-a moderator of treatment efficacy in patients with low back pain: a systematic review

Spine J 2014 11 01 14 11 2658 78

10.1016/j.spinee.2014.02.033

24614254

S1529-9430(14)00234-4

Urquhart

Bell

Cicuttini

Cui

Forbes

Davis

Negative beliefs about low back pain are associated with high pain intensity and high level disability in community-based women

BMC Musculoskelet Disord 2008 11 04 9 1 148 8

10.1186/1471-2474-9-148

18983681

1471-2474-9-148

PMC2587466

Bruehl

Liu

Burns

Chont

Jamison

Associations between daily chronic pain intensity, daily anger expression, and trait anger expressiveness: an ecological momentary assessment study

Pain 2012 12 153 12 2352 2358

10.1016/j.pain.2012.08.001

22940462

00006396-201212000-00011

PMC3586195

Burns

Quartana

Bruehl

Anger suppression and subsequent pain behaviors among chronic low back pain patients: moderating effects of anger regulation style

Ann Behav Med 2011 08 5 42 1 42 54

10.1007/s12160-011-9270-4

21544702

PMC4170680

Slade

Molloy

Keating

Stigma experienced by people with nonspecific chronic low back pain: a qualitative study

Pain Med 2009 01 01 10 1 143 54

10.1111/j.1526-4637.2008.00540.x

19222775

PME540

Chou

Brady

Urquhart

Teichtahl

Cicuttini

Pasco

Brennan-Olsen

Wluka

The Association Between Obesity and Low Back Pain and Disability Is Affected by Mood Disorders: A Population-Based, Cross-Sectional Study of Men

Medicine (Baltimore) 2016 04 95 15 e3367

10.1097/MD.0000000000003367

27082599

00005792-201604120-00062

PMC4839843

Agha

The rising prevalence of obesity: part A: impact on public health

Int J Surg Oncol (N Y) 2017 08 2 7 e17

10.1097/IJ9.0000000000000017

29177227

PMC5673154

March

Zheng

Huang

Wang

Zhao

Blyth

Smith

Buchbinder

Hoy

Global low back pain prevalence and years lived with disability from 1990 to 2017: estimates from the Global Burden of Disease Study 2017

Ann Transl Med 2020 03 8 6 299 299

10.21037/atm.2020.02.175

32355743

atm-08-06-299

PMC7186678

Liddle

Pennick

Interventions for preventing and treating low-back and pelvic pain during pregnancy

Cochrane Database Syst Rev 2015 09 30 9 CD001139

10.1002/14651858.CD001139.pub4

26422811

PMC7053516

Brady

Hussain

Brown

Heritier

Wang

Teede

Urquhart

Cicuttini

Course and Contributors to Back Pain in Middle-aged Women Over 9 Years: Data From the Australian Longitudinal Study on Women's Health

Spine (Phila Pa 1976) 2018 12 01 43 23 1648 1656

10.1097/BRS.0000000000002702

29794589

00007632-201812010-00012

Cicuttini

Davis

Bell

Botlero

Fitzgibbon

Urquhart

Poor general health and lower levels of vitality are associated with persistent, high-intensity low back pain and disability in community-based women: A prospective cohort study

Maturitas 2018 07 113 7 12

10.1016/j.maturitas.2018.04.007

29903650

S0378-5122(18)30067-7

Zhang

Wheldon

Dunn

Tao

Huo

Zhang

Prosperi

Guo

Bian

Mining Twitter to assess the determinants of health behavior toward GPU-DMM: General Pólya Urn Dirichlet Multinomial Mixturehuman papillomavirus vaccination in the United States

J Am Med Inform Assoc 2020 02 01 27 2 225 235

10.1093/jamia/ocz191

31711186

5621519

PMC7025367

Zhao

Chen

Perkins

Liu

Ding

Zou

A heuristic approach to determine an appropriate number of topics in topic modeling

BMC Bioinformatics 2015 16 Suppl 13 S8

10.1186/1471-2105-16-S13-S8

26424364

1471-2105-16-S13-S8

PMC4597325

Wallach

Murray

Salakhutdinov

Mimno

Evaluation methods for topic models

Proceedings of the 26th Annual International Conference on Machine Learning 2009

26th Annual International Conference on Machine Learning

June 14-18, 2009

Montreal, Quebec, Canada

ACM Press

1 8