This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
The COVID-19 pandemic and its corresponding preventive and control measures have increased the mental burden on the public. Understanding and tracking changes in public mental status can facilitate optimizing public mental health intervention and control strategies.
This study aimed to build a social media–based pipeline that tracks public mental changes and use it to understand public mental health status regarding the pandemic.
This study used COVID-19–related tweets posted from February 2020 to April 2022. The tweets were downloaded using unique identifiers through the Twitter application programming interface. We created a lexicon of 4 mental health problems (depression, anxiety, insomnia, and addiction) to identify mental health–related tweets and developed a dictionary for identifying health care workers. We analyzed temporal and geographic distributions of public mental health status during the pandemic and further compared distributions among health care workers versus the general public, supplemented by topic modeling on their underlying foci. Finally, we used interrupted time series analysis to examine the statewide impact of a lockdown policy on public mental health in 12 states.
We extracted 4,213,005 tweets related to mental health and COVID-19 from 2,316,817 users. Of these tweets, 2,161,357 (51.3%) were related to “depression,” whereas 1,923,635 (45.66%), 225,205 (5.35%), and 150,006 (3.56%) were related to “anxiety,” “insomnia,” and “addiction,” respectively. Compared to the general public, health care workers had higher risks of all 4 types of problems (all
The impact of COVID-19 and the corresponding control measures on the public’s mental status is dynamic and shows variability among different cohorts regarding disease types, occupations, and regional groups. Health agencies and policy makers should primarily focus on depression (reported by 51.3% of the tweets) and insomnia (which has had an ever-increasing trend since the beginning of the pandemic), especially among health care workers. Our pipeline timely tracks and analyzes public mental health changes, especially when primary studies and large-scale surveys are difficult to conduct.
The global COVID-19 pandemic has drastically changed people’s daily lives since the first confirmed case in December 2019 [
Studies have pointed out that health care workers in the United States experience psychological distress, facing high levels of anxiety, depression, and burnout during the pandemic [
Due to their large scale, immediacy, and comprehensive coverage, social media platforms (such as Twitter, Facebook, and Weibo) have been vital data sources of research to analyze public perceptions timely when primary studies and large-scale surveys are difficult to be conducted. For example, Chew et al [
Finally, there is inconsistency in studying the effect of lockdown policies—one of the most highly debated topics related to mental health during the pandemic. Das et al [
To fill in these research gaps and potentially resolve the inconsistency, this study aimed to use related data from February 1, 2020—the beginning of the pandemic—to April 30, 2022, to analyze public mental status, problem types, their temporal and geographic distributions during COVID-19, as well as the effects of lockdown policies on public mental health across states (Figure S1 in
What types of mental health problems were the most frequent?
What mental health–related topics were the public the most concerned about, and how did relevant discussions change over time?
Are there differences in mental health concerns between the general population and health care workers?
How did lockdown policies impact public mental health?
To answer question 1, two mental health experts from our teams curated a mental health lexicon for Twitter that categorizes related tweets into 4 common mental health problems: anxiety, depression, insomnia, and addiction. Based on this lexicon, we extracted related tweets and visualized their distributions by week and state. To answer questions 2 and 3, we built a pipeline to identify potential health care workers, used a topic model to summarize related tweets into 16 topics, and compared the topic distributions among health care workers and the general population. To answer question 4, we identified tweets related to mental issues and compared their proportions before and after lockdown policies across different US states.
We collected and downloaded COVID-19–related tweets from February 1, 2020, to April 30, 2022, from Twitter’s application programming interface using the unique tweet ID provided by an open-source COVID-19 tweet database [
This study was conducted with approval by the Institutional Review Board of Zhejiang University (ZGL202201-2).
We removed tweets that contain URLs because such tweets often only included summaries or quotations of the original contents (169,660,346 tweets remained). A psychiatrist and a psychologist curated a mental health lexicon with 231 keywords. The keywords were categorized into 4 subgroups: anxiety, depression, insomnia, and addiction (Table S1 in
Data collection and preprocessing.
The geographic information of users was collected from 2 fields of the tweets: (1) the “place” field in tweet metadata and (2) the “location” variable nested in the “user” field of tweet metadata. The “place” information was chosen as the primary evidence of the users’ geographic information, since it is generated from GPS data and is, therefore, more accurate than the information from the self-reported “location” field. We used a list of US state names to extract users’ geographic information (“Methods” in
The Latent Dirichlet Allocation model [
To identify health care workers, we built a health care worker identification lexicon, whose keywords can be roughly divided into 3 groups: occupation, degree, and the title of the association (“Methods” in
We applied standard descriptive statistics to summarize the 4 types of mental health–related tweets proportion, including median and IQRs. Wilcoxon matched-pairs signed-ranks test was used to compare differences between health care workers and the general population. Interrupted time series analysis [
Data preprocessing selected 4,213,005 mental health–related tweets from 2,316,817 users (
The trends of the weekly numbers of COVID-19 new cases and mental health–related tweets in 4 subgroups are shown in Figure S2 in
Trends of 4 types of mental health symptom–related tweets by the proportion of tweets.
Proportion distribution of mental health–related tweets in the United States.
The most frequent terms for mental health–related tweets were “people,” “worried,” “shame,” “panic,” “lockdown,” “anxiety,” “mask,” etc (Figure S3 in
Dynamic characteristics of topic proportions.
We assessed the differences in the proportions of 4 mental health symptom–related tweets between health care workers and the general population and showed the results in
Comparison of proportions of mental health–related tweets between health care workers and the general population.
Mental health symptom | Health care workers (% tweets), median (IQRa) | General population (% tweets), median (IQRa) | W | |
Anxiety | 1.103 (1.02-1.187) | 1.025 (0.956-1.094) | 2120 | <.001 |
Depression | 1.519 (1.396-1.642) | 1.255 (1.171-1.339) | 26 | <.001 |
Insomnia | 0.251 (0.175-0.328) | 0.131 (0.093-0.17) | 7 | <.001 |
Addiction | 0.139 (0.114-0.164) | 0.086 (0.079-0.094) | 185 | <.001 |
aIQR and Wilcoxon matched-pairs signed-ranks test were applied to compare the differences between the 2 groups.
The distribution of tweets in topics for health care workers and the general population. (A) Average number of tweets per user in each topic. (B) Logarithmic ratio of the average number of tweets between health care workers and the general population on each topic. The ratio equals the average number of tweets per user among health care workers divided by the average number of tweets among the general population.
We selected 12 states with more than 20,000 related tweets during the study period to explore the effect of lockdown policies on public mental status. We report the significant results found in Michigan, Pennsylvania, North Carolina, and Ohio (analysis results of the other 8 states are displayed in Figure S5 in
Daily proportion of mental health–related tweets before and after lockdown policies.
The impact of lockdown policies on public mental health.
State | Date | Intercept | Timea | Policyb | Time*policyc | ||||||
Michigan | March 24, 2020 | 0.0528 | <.001 | –0.0021 | .003 | –0.0214 | .17 | 0.002 | .03 | 4.669 | .009 |
North Carolina | March 30, 2020 | 0.0461 | <.001 | –0.0015 | .04 | –0.0228 | .16 | 0.0017 | .08 | 2.509 | .08 |
Ohio | March 23, 2020 | 0.0429 | <.001 | –0.0013 | .03 | –0.0117 | .39 | 0.0012 | .14 | 2.078 | .13 |
Pennsylvania | April 1, 2020 | 0.0254 | <.001 | 0.0002 | .63 | 0.0288 | .007 | –0.0012 | .04 | 3.033 | .046 |
aTime: a continuous variable encoding the number of days in the research period (15 days before and after lockdown).
bPolicy: a binary variable, encoded as 0 before the lockdown policy and 1 after the policy.
cTime*policy: the interaction term of time and policy.
We investigated public mental status for 2 and a half years since the beginning of the pandemic by analyzing topics of Twitter discussions, examining potential differences between health care workers and the general population, and studying the impacts of statewide lockdown policies. We found that anxiety and depression problems were frequently mentioned on Twitter during the study period, and the proportion of insomnia discussions increased continuously. The content analysis of mental health–related tweets revealed potential reasons: control measures, economic collapse, pressure from unemployment, and so on. Based on Twitter mentions, we found that all 4 mental health problems studied in this paper (addiction, anxiety, depression, and insomnia) were significantly more prevalent among health care workers than the general population. Finally, lockdown policies had different influences on public mental health status in different states. Among the 12 states studied, the negative effect of lockdown policies on public mental health was significant in Pennsylvania but not the other states.
Consistent with research on similar topics, we found that COVID-19 has severely impacted public mental health and has dynamic influences on public mental health [
The topic analysis shows that the public was concerned about the pandemic, its prevention, and the economic and educational problems caused by COVID-19. Topics such as “social distancing,” “test results,” “world pandemic,” “COVID-19 news,” and “economic collapse” were both observed in our work and previous studies [
Unlike previous studies that only compare the prevalence of mental health symptoms between health care workers and the general population [
Lockdown policies had various effects on mental health discussions across US states. In Pennsylvania, it showed a positive association with mental health discussions. However, an opposite association was observed in Michigan, North Carolina, and Ohio. The literature also suggests geographically different associations between local lockdown policies and public mental health. For example, Mittal et al [
Previous work on the same topic has either not focused on the subtypes of mental health problems or studied them over short periods. Our work fills these research gaps by focusing on more granular types of mental health problems over a more extended study period. We built a comprehensive pipeline, including temporal, geographic, and discussion topic analyses; comparisons of trends and topics of concern between groups; and the impact of lockdown policies. On top of the analyses, we released the code and contributed 2 lexicons that can be used to identify mental health issues and health care professionals from tweets.
We also acknowledge the following limitations. First, the evaluation of public mental health on social media is inevitably biased due to the underlying population distribution of social media users. For example, older adults and people with low socioeconomic status may have less access to social media. As a result, this study may not reflect accurate attributes of such subpopulations. However, given the sheer number of people on Twitter, the results of this study are helpful and valuable in tracking public mental health during the pandemic. Additionally, future work could consider sampling according to users’ age to avoid this problem. Second, professional psychologists must make precise diagnoses of mental health problems following official heuristics. Therefore, identifying patients using lexicons based on their tweets can introduce false cases. To validate the reliability of the lexicon, we had professional psychiatrists curate the lexicon based on sampled tweets. Third, tweets that contain keywords do not always reflect the user’s mental health status as they can instead be comments on the news or from other people. To reduce this noise, we removed tweets containing URLs in our preprocessing step, as these tweets were usually summarizations or quotes of different information sources.
The proposed pipeline can be applied to study other public mental health problems, such as suicidal thoughts, posttraumatic stress disorder, paranoia, and so on. It can also be applied to studying characteristics of other cohorts, such as sex minority groups, college students, etc. Regarding the analyses, more data sources (eg, surveys and interviews) could be introduced to validate the conclusions of this research.
This study developed a comprehensive pipeline to use social media for tracking and analyzing public mental status during a pandemic. It also contributed 2 lexicons that could be used in future studies. We found that the impact of COVID-19 and the corresponding control measures on the public’s mental status is dynamic and shows variability among different cohorts regarding disease types, occupations, and regional groups. Health agencies and policy makers should primarily focus on depression (reported by 51.3% of the tweets) and insomnia (which has had an ever-increasing trend since the beginning of the pandemic), especially among health care workers. Our approach works efficiently, especially when primary studies and large-scale surveys are difficult to conduct. It can be extended to track the mental status of other cohorts (eg, sex minority groups and adolescents) or during different pandemic periods.
Supplementary methods, pictures, and tables.
The proportion and 95% CIs of mental health–related tweets in each state by month.
JY was partially supported by the Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province (2020E10004). The funders had no role in the design and conduct of the study.
The data and code supporting the study’s findings are available at https://github.com/zjumh/mental-health-during-COVID.
ML and JY designed the study and drafted the manuscript. YH prepared the data, provided feedback on the study design, and helped draft and revise the manuscript. ML performed data and statistical analysis. YL and LW built the lexicon of mental health keywords. YL, LZ, and XL provided critical reviews. All authors reviewed the manuscript. ML takes responsibility for the integrity of the work.
None declared.