Partisan Differences in Twitter Language Among US Legislators During the COVID-19 Pandemic: Cross-sectional Study

Background: As policy makers continue to shape the national and local responses to the COVID-19 pandemic, the information they choose to share and how they frame their content provide key insights into the public and health care systems. Objective: We examined the language used by the members of the US House and Senate during the first 10 months of the COVID-19 pandemic and measured content and sentiment based on the tweets that they shared. Methods: We used Quorum (Quorum Analytics Inc) to access more than 300,000 tweets posted by US legislators from January 1 to October 10, 2020. We used differential language analyses to compare the content and sentiment of tweets posted by legislators based on their party affiliation. Results: We found that health care–related themes in Democratic legislators’ tweets focused on racial disparities in care (odds ratio [OR] 2.24, 95% CI 2.22-2.27; P<.001), health care and insurance (OR 1.74, 95% CI 1.7-1.77; P<.001), COVID-19 testing (OR 1.15, 95% CI 1.12-1.19; P<.001), and public health guidelines (OR 1.25, 95% CI 1.22-1.29; P<.001). The dominant themes in the Republican legislators’discourse included vaccine development (OR 1.51, 95% CI 1.47-1.55; P<.001) and hospital resources and equipment (OR 1.22, 95% CI 1.18-1.25). Nonhealth care–related topics associated with a Democratic affiliation included protections for essential workers (OR 1.55, 95% CI 1.52-1.59), the 2020 election and voting (OR 1.31, 95% CI 1.27-1.35), unemployment and housing (OR 1.27, 95% CI 1.24-1.31), crime and racism (OR 1.22, 95% CI 1.18-1.26), public town halls (OR 1.2, 95% CI 1.16-1.23), the Trump Administration (OR 1.22, 95% CI 1.19-1.26), immigration (OR 1.16, 95% CI 1.12-1.19), and the loss of life (OR 1.38, 95% CI 1.35-1.42). The themes associated with the Republican affiliation included China (OR 1.89, 95% CI 1.85-1.92), small business assistance (OR 1.27, 95% CI 1.23-1.3), congressional relief bills (OR 1.23, 95% CI 1.2-1.27), press briefings (OR 1.22, 95% CI 1.19-1.26), and economic recovery (OR 1.2, 95% CI 1.16-1.23). Conclusions: Divergent language use on social media corresponds to the partisan divide in the first several months of the course of the COVID-19 public health crisis. (J Med Internet Res 2021;23(6):e27300) doi: 10.2196/27300


Introduction
The novel COVID-19 pandemic continues to surge throughout the world. The United States' federal and state policy responses continue to shift and vary throughout the stages of the pandemic [1]. Notable divisions related to public health measures and frameworks for closing and reopening local economies have proliferated [2]. A unique aspect of the COVID-19 pandemic is the role that social media plays in housing, disseminating, and amplifying information and opinions [3,4]. US legislators have also taken to social media to connect with their constituents, comment on the pandemic, and provide information across a spectrum of pandemic-related content to individuals.
Understanding what content US legislators are sharing through social media posts (eg, Twitter) and how they are relaying COVID-19-related information is important, as these issues guide public knowledge and public opinion and inform policy change. By using social media data, prior studies have identified growing partisan differences among Republican and Democrat legislators as the pandemic has progressed [5]. It has also been found that tweets about specific topics (eg, social distancing) from legislators are often associated with the time when policies are put into action, and the effect of such tweets are larger in democratic counties [6].
The objective of this study was to analyze the language in posts on Twitter-a leading social media platform-that were posted by US legislators over the course of the pandemic to identify potential health care-related themes in COVID-19-related posts and to analyze the associated sentiment within tweet language across partisans.

Data
We identified state legislators' Twitter posts that were related to COVID-19 and posted from January 1 to October 10, 2020, by using Quorum (Quorum Analytics Inc) [7], a software platform that collects policy-related documents, including social media posts from politicians that were posted during their time in office. This study was considered exempt from review by the University of Pennsylvania Institutional Review Board, as it involves the analysis of public-facing data.

Language Feature Extraction
We extracted the relative frequency of single words and phrases from tweets by using the Differential Language Analysis ToolKit package [8] and created two sets of features-(1) an open vocabulary that was defined by using latent Dirichlet allocation [9], an unsupervised clustering algorithm, to create 50 data-driven word clusters (topics) and (2) sentiment, which was measured by using the National Research Council (NRC) Canada lexicon [10], a data-driven dictionary containing words associated with positive and negative valence. The NRC lexicon was developed by using a corpus of 77,500 positive and negative tweets, and consists of 54,129 weighted unigrams and 316,531 bigrams in which the weight corresponds to the degree of association between a token and sentiment [10].

Statistical Analyses
To distinguish linguistic differences across political parties (coded as a dichotomous outcome), each feature set was input in a logistic regression model, and those that were significantly different according to a cutoff Benjamini-Hochberg-corrected P value of <.001 were reported [11]. Two authors independently evaluated each topic for thematic meanings by reviewing the top 10 posts per topic and coded them into health care-related and nonhealth care-related themes.
Data on changes in the prevalence and sentiment of topics that were significantly associated with either party and occurred over time were obtained by calculating the mean scores across all posts per week, stratified by party, and visualized via locally estimated scatterplot smoothing [12].

US Legislators' Tweets
We identified 309,438 COVID-19-related tweets from the 4224 unique accounts of US legislators. The descriptive statistics of the data set are in Table 1. The number of tweets per legislator over the selected time period is shown in Multimedia Appendix 1. Tweet language that correlated with US legislature party affiliation is displayed in Figure 1. Of the statistically significant topics, we identified 7 health care-related themes and 14 nonhealth care-related themes associated with the two major party affiliations.

Thematic Differences by Party Affiliation
Health care-related themes ( Nonhealth care-related topics were also identified across parities. The themes associated with a Democratic affiliation included the following: protections for essential workers, the 2020 election and voting, unemployment and housing, crime and racism, public town halls, the Trump Administration, immigration, and the loss of life. The themes associated with a Republican affiliation included the following: China, congressional relief bills, small business assistance, press briefings, and economic recovery ( Table 3). The prevalence of the themes over time stratified by affiliation is shown in Multimedia Appendices 2 and 3. The set of topics that did not significantly correlate with affiliation are shown in Multimedia Appendix 4. Table 2. Health care-related topics that are more likely to be posted by Democrat legislators and Republican legislators. Effect size is shown by using odds ratios (ORs) along with 95% CIs. Only significant topics after Benjamini-Hochberg p-correction (P<.001) are shown.

Sentiment Differences by Party Affiliation
We performed an analysis of sentiment for the language used in tweets and found that overall, Republican-affiliated tweets used more positive sentiment, which increased over time. The variation in overall sentiment is shown in Figure 2. Negative sentiment was associated with content from both parties across the following themes: health care and insurance, COVID-19 testing, and racial disparities. Positive sentiment was associated with content within the theme of government public health expertise. Sentiment within themes over time and across parities is identified in Multimedia Appendices 5 and 6.

Discussion
By using machine learning techniques, we investigated narrative content in over 300,000 twitter posts from US legislators over the course of the COVID-19 pandemic to date. Investigating the language within posts on social media platforms has become more common and has been specifically used to study aspects of health and health care. This study is among the first to analyze US legislators' Twitter-based language to identify the COVID-19-related themes that policy makers are discussing on Twitter with a specific focus on health care-related topics. Additionally, this study deployed advanced language assessments that use machine learning to analyze how legislators are talking about these themes by conducting sentiment analyses throughout the phases of the pandemic.
We noted key differences across the two major US political parties. Health care-related themes that correlated with a Democratic party affiliation focused on the health care access and disparities across race. The themes that correlated with a Republican party affiliation focused on initial and persistent vaccine progress, access to equipment (eg, personal protective equipment), and government expertise. Furthermore, in the language analysis, we identified that across content posted by Republican legislators, there was considerably more content about the pandemic and approaches for managing the pandemic across health care topics. Language analysis was also used to detect thematic differences in narrative content within Twitter posts across the two major political parties. In this study, our results indicated that legislators with a Democratic party affiliation focused their COVID-19 content more toward social services and racial disparities. Content from Republican-affiliated legislators focused thematically on government relief and economic aid. This finding is consistent with surveys of elected officials and the general public, which suggests that awareness and concern about health disparities among Democrats are greater than those among Republicans [13,14].
There are limitations to this study, including the fact that content was collected from publicly available Twitter posts; thus, legislators who do not post content were not included. If a legislator did not have a party affiliation (as noted by the Quorum database), we could not include them in this analysis. We also did not control for demographic or health access data, as our analysis was performed on the language of individual legislators. Further, a topic's significant association with a particular affiliation does not imply that other party legislators did not tweet about it; it only indicates the relative frequency of tweets containing the words that were associated with each topic.
This study highlights the ability to understand how legislators use social media (eg, Twitter); what information they choose to share; and how they frame their content, which was determined through sentiment analysis [15]. These are key insights that will remain important to the public and health care systems as policy makers continue to shape the national and local responses to the pandemic [16].