Background: The COVID-19 pandemic is a traumatic individual and collective chronic experience, with tremendous consequences on mental and psychological health that can also be reflected in people’s use of words. Psycholinguistic analysis of tweets from Twitter allows obtaining information about people’s emotional expression, analytical thinking, and somatosensory processes, which are particularly important in traumatic events contexts.
Objective: We aimed to analyze the influence of official Italian COVID-19 daily data (new cases, deaths, and hospital discharges) and the phase of managing the pandemic on how people expressed emotions and their analytical thinking and somatosensory processes in Italian tweets written during the first phases of the COVID-19 pandemic in Italy.
Methods: We retrieved 1,697,490 Italian COVID-19–related tweets written from February 24, 2020 to June 14, 2020 and analyzed them using LIWC2015 to calculate 3 summary psycholinguistic variables: emotional tone, analytical thinking, and somatosensory processes. Official daily data about new COVID-19 cases, deaths, and hospital discharges were retrieved from the Italian Prime Minister's Office and Civil Protection Department GitHub page. We considered 3 phases of managing the COVID-19 pandemic in Italy. We performed 3 general models, 1 for each summary variable as the dependent variable and with daily data and phase of managing the pandemic as independent variables.
Results: General linear models to assess differences in daily scores of emotional tone, analytical thinking, and somatosensory processes were significant (F6,104=21.53, P<.001, R2= .55; F5,105=9.20, P<.001, R2= .30; F6,104=6.15, P<.001, R2=.26, respectively).
Conclusions: The COVID-19 pandemic affects how people express emotions, analytical thinking, and somatosensory processes in tweets. Our study contributes to the investigation of pandemic psychological consequences through psycholinguistic analysis of social media textual data.
As a way to express information, news, opinions, and even private emotions and to connect people worldwide, Twitter, established in 2006, is a microblogging service that is the 13th most-used social media platform, with 340 million users . In the first quarter of 2020, it registered 166 million average monetizable daily active users, with a 24% growth from 2019 [ ]. And Twitter itself attributes part of this exceptional growth to a “global conversation related to the COVID-19 pandemic” [ ]. While the coronavirus disease pandemic is affecting the world, regional and national lockdowns are restraining the possibility to travel and physically meet other people: Social networks, including Twitter, now represent a way to keep in touch, exchange information, solve problems, and conversate together and globally. And there is even something more.
Infodemiology is a new research field using online data and defined as “the science of distribution and determinants of information in an electronic medium, specifically the internet, or in a population, with the ultimate aim to inform public health and public policy” . Among the infodemiology indicators, “metrics on the ‘chatter’ in discussion groups, blogs, and microblogs (eg, Twitter)” [ ] are considered. Along this line, various researchers have successfully used this kind of data in the health context. Specifically, Twitter represents a unique opportunity for scholars to recruit participants, deliver interventions, or directly obtain data [ ]. In particular, as a data source, it can provide population-level, real-time, high-volume, easily, publicly accessible data [ ]: These are beneficial features, especially in the psychosocial field that normally relies on self-report, time-lagged questionnaires, with limited numbers of participants.
Today, Twitter-based health research represents a rapidly developing field, combining different methodologies and applying to various contexts, such as public health, infectious diseases including Ebola and influenza, neurology, and psychiatry . Some studies have also been conducted in the COVID-19 pandemic context, demonstrating the feasibility of using Twitter as a means to collect valuable data to obtain deep insights in this emergency situation. Lwin and colleagues [ ] collected more than 20 million tweets written worldwide during the first phases of the pandemic and studied the emotional responses to COVID-19 by using sentiment analysis. Xue and colleagues [ ] used sentiment analysis alongside unsupervised machine learning and qualitative methods to identify main COVID-19–related themes discussed on Twitter, such as news, cases, and deaths, accompanied by a sentiment of fear.
Other studies, instead, relied on psycholinguistic analysis of Twitter data. Su and colleagues  used psycholinguistic analysis on Weibo and Twitter posts to investigate the psychological impact of lockdown measures in China and Italy: After lockdown, people used more cognitive processes and home words.
Indeed, as demonstrated by a vast amount of literature , the words we use in our daily lives have various links to different psychosocial variables, including mental health, psychological status, and “ongoing emotional and cognitive coping processes, and idiosyncratic reactions to crisis” [ ]. In fact, the pandemic could be considered as “[…] the cause of individual and collective traumas” [ ], that is, also having tremendous consequences on mental and psychological health [ , ].
Overall, psycholinguistic analysis of textual data coming from Twitter allows some advantages. Usually, assessing psychological variables requires the recruitment of a sample of participants, relying on their availability to individually administer questionnaires and instruments. This process is expensive and time-consuming, resulting in a limited amount of data, often biased by the issues associated with self-report instruments, such as a time lag between the event of interest experienced by people and the moment of data collection. Psycholinguistic analysis of Twitter data requires downloading, in a quite fast and automatic way, a massive amount of population-level data in near real time—as tweets written immediately after the event of interest—in a discreet and unobtrusive way, resulting in a faster and less expensive process.
Among the psychological variables, psycholinguistic analysis of textual Twitter data could provide information about emotional expressions, analytical thinking, and somatosensory processes, which are particularly important in traumatic event contexts.
Specifically, emotional tone is a psycholinguistic variable that summarizes the presence of positive and negative emotions in written text as the difference between positive-emotion words and negative-emotion words . Individuals’ expressions of emotions in language are connected to the way they experience the world and also react to, and cope with, traumatic events [ ]. In particular, experiencing positive emotions after a challenging event is important for resilience [ ], while some studies highlighted how, after a traumatic experience such as the September 11 attacks, the emotional tone in journal entries by people in the United States was low, in other words characterized by a negative tone, which slowly rebuilt after some time [ ].
Analytical thinking is a psycholinguistic variable that reflects “the degree to which people use words that suggest formal, logical, and hierarchical thinking patterns” . A lower level of analytical thinking reflects a more narrative and personal thinking pattern. The value of cognitive words in trauma narratives remains controversial: These types of words are linked with positive or negative effects on people’s well-being [ ].
In trauma narratives, somatosensory words, such as words related to body, sensory, and perceptual processes, assume great relevance, with a stronger presence than in other neutral or positive-tone narratives . The use of this type of words is associated with the symptoms of posttraumatic stress disorder (PTSD) and depression [ - ].
Our aim was to analyze the influence of the pandemic—such as official Italian COVID-19 daily data (new cases, deaths, and hospital discharges) and the phase of managing the pandemic—on psycholinguistic variables in Italian tweets written during the first phases of the COVID-19 pandemic in Italy.
The pandemic is characterized by daily information about new cases and deaths and by governments’ decisions and restrictions that impact everyone’s lives: It could be considered a collective and individual traumatic experience . This traumatic experience can have profound psychological consequences on mental health and the well-being of citizens that can also be reflected, as discussed earlier, in people’s use of words, specifically the emotional tone, analytical thinking, and somatosensory processes variables.
Our aim was to analyze the way people express emotions, their analytical thinking, and somatosensory processes in a sample of Italian tweets during the first phases of the COVID-19 pandemic in Italy. Specifically, we were interested in assessing the influence of official Italian COVID-19 daily data (eg, new cases, new deaths, and hospital discharges) as well as the phase of managing the outbreak on tweets occurring during the following 24 hours, specifically on the emotional tone, analytical thinking, and somatosensory processes in tweets.
The dataset used in this study came from a large-scale COVID-19 Twitter chatter project that actively collected COVID-19 tweets from January 1, 2020 (for a brief overview, see ). Specifically, this dataset, which has been made freely available by Banda and colleagues [ ] through Zenodo, includes tweets collected from the publicly available Twitter Stream API with a collection process that gathered any available tweets with keywords related to COVID-19 (eg, “coronavirus,” “2019ncov,” “COVID19,” “COVID-19”). See [ ] for further information on the full list of keywords and the rationale for their selection and inclusion. As of September 20, 2020, this project had collected almost 166 million unique tweets. The project only released the Tweet IDs of the collected tweets; thus, the software DocNow Hydrator was used to extract tweets. This user-friendly software has been proven effective by previous research [ , ]. We only selected tweets in the Italian language created between 6:00 pm on February 24, 2020 and 11:59 pm on June 14, 2020. Both the language and timestamp of tweets are provided directly by Twitter through its API, a tool to contribute to, engage with, and analyze the conversation happening on Twitter. We chose to focus on this period because official data about the COVID-19 outbreak were available since 6:00 pm on February 24, 2020 (ie, 3 days after Italian Patient One was tested positive), and “Phase 3” started on June 15, 2020, characterized by a sharp loosening of previous public health measures and restrictions.
In addition, official data about daily new cases, new deaths, and new discharges from hospital were also retrieved from the GitHub page of the Italian Prime Minister's Office and Civil Protection Department. From February 24, 2020 to April 17, 2020, data on the COVID-19 outbreak in Italy were communicated in a press conference held daily at 6:00 pm by the head of the Civil Protection Department. After April 17, 2020, the daily press conference was no longer held, but official information about the pandemic continued to be released at 6:00 pm through a daily bulletin.
We considered 3 different phases of managing COVID-19, characterized by distinct restrictions and measures to counteract virus spreading. The first was the outbreak, from February 24, 2020 (ie, the day on which the official Civil Protection Department 6:00 pm press conference began) to March 8, 2020: Along with the first confirmed indigenous cases, regional and national governments began to take action, including school and university closures, postponing or canceling some public events, and strict lockdown for 11 municipalities in northern Italy. The second was Phase 1, from March 9, 2020 to May 3, 2020: A “I stay home” national decree imposed lockdown in all Italian regions, and citizens were allowed to leave their homes only for documented work, health, or emergency reasons, while nonessential commercial activities were closed. The third was Phase 2, from May 4, 2020 to June 14, 2020: A gradual relaxing of lockdown restrictions began, with reopening of some services and activities, such as parks, museums, restaurants, and bars for take-away service; practicing social distance remained mandatory.
Data use complied with ethical guidelines for internet research . The European Union General Data Protection Regulation 2016/679 allows for the use of anonymous data for research purposes under certain conditions. Since all analyses have been performed on public and anonymized meta-data, no institutional review board approval was required for the use of this database or the completion of this study.
Text mining and text analysis were performed with R version 3.4.3 and Linguistic Inquiry and Word Count (LIWC) 2015. We were interested in understanding whether daily data on the COVID-19 outbreak would affect how people express emotion, cognition, and somatosensory processes in their tweets during the following 24 hours. Thus, before analysis, all tweets were preprocessed: Daily tweets from 6:00 pm to 5:59 pm the following day were merged into a single text file. For instance, the overall corpus for March 1, 2020 included aggregated text coming from 11,707 tweets from 6:00 pm on March 1 to 5:59 pm on March 2. There was a total of 1,692,181 tweets from 6:00 pm on February 24, 2020 to 23:59 pm on June 14, 2020; the number of tweets per day ranged from 6977 (on June 14, 2020) to 33,356 (on May 25, 2020) with a daily average of 15,108.76 (SD 3895.29) tweets.
Then, each daily text was analyzed with the Italian LIWC2007 Dictionary  and the Italian Function Words Dictionary 2015 of LIWC2015 [ ]. LIWC calculates the percentage of total words in each text that falls into predefined linguistic and psycholinguistic categories. We then computed separate indexes for emotional tone, analytical thinking, and somatosensory processes. Based on previous research, each of these 3 summary variables are constructed from different LIWC categories. First, to calculate the emotional tone score, we employed the procedure described by Cohn et al [ ]. Specifically, tone was computed as (positive emotion) – (negative emotion): thus, the higher the score, the more positive the emotional tone of daily tweets. Second, analytical thinking is a factor-analytically derived dimension based on 8 function word dimensions. This dimension “captures the degree to which people use words that suggest formal, logical, and hierarchical thinking patterns” [ ]. It was computed as (articles) + (prepositions) - (total pronouns) - (auxiliary) - (negations) - (conjunctions) - (adverbs) [ ]: the higher the score, the higher the analytical thinking of the daily tweets. Third, as somatosensory details, in particular words related to body and perception, have been found to be common and important in different studies examining trauma narratives [ ], we decided to calculate a somatosensory index, namely somatosensory processes: This index was computed as (perceptual processes) + (body). These 2 categories captured the use of words related to perceptual experiences (such as “observing, heard, feeling, rumors, touch”) and body parts, processes, or diseases (such as “cheek, hands, spit, cough, flesh, brain, hearth, pain, contagious, headache, sick”), tapping into perceptual and sensory features that are meant to be common in this type of narrative. Higher scores in this index imply higher somatosensory experiences expressed in daily tweets. Since emotional tone, analytical thinking, and somatosensory processes were computed for each day by considering all the text coming from daily tweets, in all subsequent analyses, the total sample was the number (ie, 122) of days from February 24, 2020 to June 14, 2020 (with days as the unit of analysis).
We performed 3 general linear models using Jamovi 1.1 [, ], 1 for each of the 3 LIWC summary variables, namely emotional tone, analytical thinking, and somatosensory processes. In each model, the LIWC summary variable was entered as the dependent variable; daily official data about new cases of COVID-19, new deaths, and new discharges were entered as continuous independent variables, while the phase of managing the COVID-19 outbreak was entered as a categorical independent variable (coded as 1=COVID-19 spreading; 2=Phase 1; 3=Phase 2). Specifically, these general linear models assessed whether daily new cases, new deaths, and new hospital discharges, alongside the phases of managing the COVID-19 pandemic, influenced the 3 daily summary variables constructed through LIWC. Besides the main effects, we included second- and third-order interaction terms for the continuous independent variable. We adopted a stepwise backward regression analysis approach. Thus, starting from the full model, nonsignificant, higher-order terms were eliminated one at a time, in order to obtain a final, more parsimonious model. If not one of the interaction terms was significant, the final model included only the main effects of all the predictors. For significant interactions, simple slope analysis was performed to test the effect of a specific predictor at different levels (ie, 1 standard deviation above and below the mean) of another predictor. All continuous independent variables were mean centered. The magnitude of each effect was interpreted by considering its associated partial eta squared (ie, ηp2). Specifically, effects were considered weak (.01 < ηp2 ≤ .06), moderate (.06 < ηp2 ≤.14), or strong (ηp2 > .14). The final dataset and the scripts to perform data analysis are available in and , respectively.
displays the trends over time for emotional tone, analytical thinking, and somatosensory processes (as z scores) as expressed in daily tweets from February 24, 2020 to June 14, 2020.
displays trends over time for daily new cases, new deaths, and new hospital discharges (as z scores) from February 24, 2020 to June 14, 2020.
Results of the 3 general linear models assessing influences on each of the 3 LIWC summary variables, namely emotional tone, analytical thinking, and somatosensory processes, are reported in. By considering emotional tone, the final general linear model was significant (F6,104=21.53, P<.001) and explained more than 55% of the dependent variable. Specifically, we found a significant interaction between daily new cases and new deaths for COVID-19 in explaining emotional tone (F1,104=4.10, β=–.24, P=.045, ηp2=.04). The simple slope analysis showed that, when the number of deaths was low (b=–0.00, SE=0.00, t104=–0.13, P=.900) or average (b=–0.00, SE=0.00, t104=–1.27, P=.207), daily new cases of COVID-19 were not related to tone. On the other hand, when the number of deaths was high, the higher the number of daily new cases, the lower the estimated emotional tone was (b=–0.00, SE=0.00, t104=–2.42, P=.017). Other interactions were not significant and, thus, were excluded one at a time by adopting a stepwise backward regression analysis approach. The main effect of daily new cases was not significant (F1,104=1.62, β=–.27, P=.207, ηp2=.02), while the effect of daily new deaths was significant but weak (F1,104=3.63, β=.41, P=.048, ηp2=.03). Moreover, emotional tone was not related with daily number of new hospital discharges (F1,104=0.12, β=–.03, P=.729, ηp2=.00). Finally, phases of managing the COVID-19 outbreak were responsible for strong differences in emotional tone (F2,104=30.27, P<.001, ηp2=.37). Estimated marginal means of daily scores of emotional tone were –1.08 (SE=0.04) during the outbreak, –0.83 (SE=0.02) during Phase 1, and –0.83 (SE=0.03) during Phase 2. As highlighted by post hoc analyses with a Bonferroni correction (P<.05), daily scores of emotional tone during the first outbreak were lower than the ones reported in both Phase 1 and Phase 2. The 2 latter phases did not differ in daily scores of tone.
|Variables||Emotional tonea||Analytical thinkingb||Somatosensory processesc|
|β||P value||ηp2||β||P value||ηp2||β||P value||ηp2|
|New case*New deaths||–.24||.045||0.04||-d||-d||-d||.44||.004||0.08|
aF6,104=21.53, P<.001, R2=.55.
bF5,105=9.20, P<.001, R2=.30.
cF6,104=6.15, P<.001, R2=.26.
The final general linear model performed to assess differences in daily scores of analytical thinking was significant (F5,105=9.20, P<.001) and explained 30% of the dependent variable. No significant second- and third-order interactions were observed; thus, all interaction terms were excluded one at a time by adopting a stepwise backward regression analysis approach. Analytical thinking was not related to daily new discharges from hospital (F1,105= 0.95, β=.10, P=.332, ηp2=.01), while it was negatively and moderately related to daily new cases of COVID-19 (F1,105=11.14, β=–.84, P=.001, ηp2=.10) and positively but weakly linked to new deaths related to COVID-19 (F1,105=5.48, β=.55, P=.021, ηp2=.05). Daily scores of analytical thinking differed moderately among different phases of managing the COVID-19 outbreak (F1,105=7.27, P=.001, ηp2=.12). Estimated marginal means for daily scores of analytical thinking were –2.14 (SE=0.51) during the outbreak, –4.01 (SE=0.29) during Phase 1, and –4.22 (SE=0.32) during Phase 2. As highlighted by post hoc analyses with a Bonferroni correction (P<.05), daily scores of analytical thinking during the first outbreak were lower than the scores reported in both Phase 1 and Phase 2. The 2 latter phases did not differ in daily scores of analytical thinking.
By considering somatosensory processes, the final general linear model was significant (F6,104=6.15, P<.001) and explained more than 26% of the dependent variable. Specifically, we found a significant interaction between daily new cases and new deaths related to COVID-19 in explaining somatosensory processes (F1,104=8.79, β=.44, P=.004, ηp2=.08). The simple slope analysis showed that, when the number of deaths was low (b=0.00, SE=0.00, t104=0.19, P=.851) or average (b=0.00, SE=0.00, t104=1.87, P=.065), daily new cases of COVID-19 were not related to somatosensory processes. On the other hand, when the number of deaths was high, the higher the number of daily new cases, the higher the estimated score of somatosensory processes was (b=0.00, SE=0.00, t104=3.55, P<.001). Other interactions were not significant and, thus, were excluded one at a time by adopting a stepwise backward regression analysis approach. The main effect of daily new cases was not significant (F1,104=3.48, β=.51, P=.065, ηp2=.03), while the main effect of daily new deaths was significant and moderate (F1,104=13.69, β=–1.01, P<.001, ηp2=.12). Moreover, daily number of new hospital discharges was not related with somatosensory processes (F1,104=0.77, β=.11, P=.383, ηp2=.01). Finally, phases of managing the COVID-19 outbreak were not responsible for differences in somatosensory processes (F2,104=0.60, P=.551, ηp2=.01).
All 3 general linear models to assess differences in daily scores of analytical thinking, emotional tone, and somatosensory processes were significant, with specific and different patterns.
As already pointed out, we might discuss our results considering this pandemic as “the cause of individual and collective traumas” . In fact, different people dealing with the same stressful event could develop various reactions: Some individuals could develop a nonpathological response, with emotional, cognitive, and physical symptoms resolving spontaneously after some days or weeks, the successful implementation of resilience and coping strategies, and a return to a previous baseline without long-lasting consequences. For these individuals, the stressful event remains only “potentially” traumatic. Other individuals, instead, develop more pathological reactions, ranging from adjustment disorders to PTSD, with trauma lived as “a complex emotional response to a stressful event, that overwhelms the individual’s capacity to cope” [ ].
Various studies have analyzed individuals’ language use after a traumatic event (eg, Cohn et al ), but, to the best of our knowledge, this is the first study using these summary variables in a sample of Italian tweets during the first phases of the pandemic. First, in all our general models, we did not find any significant effect of the daily number of new hospital discharges on our variables of interest. Daily hospital discharges, compared to daily new cases and new deaths, was the only “positive” data considered. The absence of any effect could be due to the negativity bias, which is the human tendency to give more importance and attention to negative data—or entities in general—[ ] such as COVID-19 deaths and new cases, while ignoring positive data, such as hospital discharges. The negativity bias has been demonstrated to be related to life stressors and PTSD [ , ], as individuals affected by PTSD tend to focus their attention on potential threats [ ]. This could also explain the fact that, in each model, summary variables were related to the negative data. So, when experiencing a stressful event, such as the pandemic period, individuals may experience negativity bias, focusing more on negative data. Experiencing these data could be considered a stressful event. In particular, we found that increases in daily deaths and daily new cases, in other words the worst situation possible, increased negative emotional tone. This seems intuitive: Negative emotion words are habitually used when writing about a negative event, such as the situation described before, and have been linked with suicide and depression [ ]. Moreover, negative alterations in mood experience, negative affect, and difficulty in experiencing positive emotions are typical reactions experienced after a stressful event and, in some cases, could be symptoms of PTSD [ ].
The same interaction was found to have an effect on increased use of somatosensory words. This result also seems intuitive, as the use of sensory, body, and perceptual words in narratives related to traumatic events are common and often linked to PTSD and its symptoms, even in studies of more individual traumatic events such as traffic accidents [, ].
Regarding analytical thinking, we found 2 opposite effects: Daily new cases were negatively linked with this linguistic marker, while new deaths were positively linked with it. High scores in the analytical thinking variable are related to a formal and logical thinking pattern, while a low score is related with a more narrative style, focused on the here and now . Various studies have considered the use of cognitive words after traumatic events: More cognitive words are often present in trauma- or distress-related narratives [ ]. Using cognitive words is linked to an individual’s effort to elaborate and integrate the event in their own memories [ ], reflecting “an active search for meaning and understanding of the stressful event” [ ]. In fact, using cognitive words, in particular causal and insight ones, when writing about a past event is linked to “the active process of reappraisal” [ ]. So, using more cognitive words is associated with better physical health [ , ], fewer PTSD symptoms [ ], and adaptive coping strategies [ ]. On the contrary, some studies have shown a link between cognitive words and PTSD symptoms [ ]. In fact, as some authors pointed out [ , ] using LIWC, it is difficult to understand how these words are used, for example referring to “organized or disorganized thoughts” or linked to “ruminative processes and fruitless attempts to assimilate what happened” [ ]. After a traumatic event, indeed, individuals’ thoughts could be affected in different ways. For instance, PTSD symptoms include intrusive and upsetting memories or negative thoughts about themselves and the world or avoidance of thoughts related to trauma [ ]. These different reactions and discordance about the meaning of cognitive words after traumatic events may account for these opposite effects. Even if more data and research are needed, we may cautiously think that, when confronted with new deaths data—the worst news—individuals may try to react using a formal and logical way of writing, trying to make sense of this negative information. Considering new cases data, so slightly less negative, people may try to avoid the data or react with a more narrative tone, feeling less the need to elaborate them.
The last interesting result we retrieved is the effect of the phase of managing the pandemic on emotional tone and analytical thinking variables. In particular, both emotional tone and analytical thinking were lower during the outbreak, then increased in the first and second phases. As explained, the initial phase of the pandemic in Italy was characterized by different restrictions and measures taken by the government in order to counteract the spread of the virus. These measures differed, in particular, between the outbreak and the first and second phases. As the first indigenous cases were confirmed at the end of February, but maybe the gravity of the situation was still not clear, different day-to-day actions and initiatives were taken in each part of Italy: Universities and schools were closed first only in northern regions and initially only for some days; 11 municipalities in Lombardy and Veneto were in strict lockdown; some major public events, such as the Carnival of Venice, were postponed or cancelled; in other regions, considered at minor risk, schools remained open with some events confirmed, such as Series A soccer matches with the presence of fans in southern Italy. However, contradictory messages hit the population: Fake news stating the closure of all Italian schools circulated at the end of February, while some ads and initiatives reassured people, even in the northern areas, to continue to live their normal lives; all of this contributed to creating a climate of uncertainty. The first and second phases, instead, were characterized by national-level and long-term measures, with a strict lockdown and suspension of nonnecessary activities in all Italian regions, which gradually loosened at the beginning of May. These 2 phases marked a tragic and dramatic situation but were more stable and predictable in their restrictions. These differences between the very first and the other phases could account for the differences retrieved in our summary variables. Uncertainty about future events, as people may experience during the outbreak phase about future restrictions and development of the emergency, is common in threat contexts and could elicit negative emotions, such as anxiety and fear . After the situation became more stable in the subsequent phases, with less uncertainty, emotional tone may increase. This emotional tone pattern confirms other results retrieved in the COVID-19 pandemic and in other trauma contexts: Sadiković and colleagues [ ] found decreased worry, fear, and boredom over 5 weeks after the first COVID-19–confirmed case in Serbia. Cohn and colleagues [ ] found that, immediately after the September 11 attacks, emotional tone measured in a sample of online journals was low, returning slowly to baseline after 1 week. Experiencing negative emotions is a typical reaction after an emotional upheaval and uncertain and threatening situation, even representing a specific criterion for PTSD disorder [ ]; experiencing positive emotions after a crisis acts as a buffer against depression in resilient individuals [ ], and positive emotions in trauma narratives are linked to better adaptation or less severe PTSD symptoms [ , ]. So, after the initial, negative reaction, the situation changed, becoming more predictable and less uncertain, and people enact their resilience and coping strategies, using more positive emotions to overcome the emotional upheaval and resulting again in a more positive way of expressing themselves.
The uncertainty of the outbreak situation—with different restrictions and even contradictory circulating messages—may also have had an impact on people's analytical thinking and use of words: Reasoning and trying to make sense of events are difficult in such contexts . People might have reacted with a more logical thinking style, trying to find meaning from the situation only during the first and second phases when things were more stable, the gravity and seriousness of the emergency became clearer, and a consistent view was reached. This result seems in contrast with the one obtained by Cohn and colleagues [ ], who highlighted a rapid increase in cognitive word use immediately after the attacks; their level returned to baseline after some days and then decreased again. We have to point out that our study and the study by Cohn et al [ ] used different writing samples (tweets vs journal entries) and also different words in the analysis: Even if theoretically tapping the same construct, such as a sort of thinking style, analytical thinking is based on function words while the cognitive processing index used by Cohn and colleagues [ ] reflects words such as because, think, and question. However, we think that these differences in results could be due to the reasons already explained: September 11 was a punctual, intense, and disruptive outbreak, leading to a rapid need to make sense of what was happening. This pandemic outbreak phase, instead, was very different, with a slower unravelling and uncertainty that persisted for weeks and weeks.
As COVID-19 could be considered “the cause of individual and collective traumas” , we discussed our results considering previous studies both concerning individual (for example, traffic accidents or relationship breakups [ , ]) and collective trauma (for example, the September 11 attacks [ ]). With heterogeneous yet similar consequences for individuals, more research is needed to highlight pandemic-specific psycholinguistic trauma at both individual and collective levels.
Our study is not exempt from certain limitations. Our data consist of publicly available Italian tweets, so our results could not be generalized to other Twitter users with private accounts nor to the general Italian population. Even if it is used by a considerable amount of people—3.7 million users as of January 2020—Twitter is now only the sixth most used social media platform in Italy.
Moreover, we did not collect any information about users actually writing the analyzed tweets: Some demographic and other characteristics (eg, gender, age, working status, coping strategies) could account for differences in reactions to official COVID-19 data and for different use of words in their tweets. Specifically, some studies showed that even the area from which people tweet could account for some differences in their tweets: Gore et al , for example, showed that geotagged tweets in US areas with lower obesity rates have, among other results, a higher level of happiness. Another study [ ] found that weather, days, and type of activities done during the day impact on emotions expressed in tourists’ tweets.
So, specifically regarding our context, we might think that urban areas and their characteristics, days, and seasonal weather could have influenced the emotional tone and, globally, the words people use in their tweets.
Implications and Future Work
We think that our study could have relevant implications for actionable policies in the health care context and for future related works expanding our research questions.
These results prove the feasibility and importance of infodemiological indicators and psycholinguistic analysis to monitor mental health–related variables in a fast and cost-effective way. While traditional psychology instruments and measures (such as self-reported questionnaires and surveys) provide a one-time measure of the variable of interest in a limited sample, this method could provide longitudinal and population-level data. Considering all the limitations and influences, this method could be used as active surveillance of the impact of a pandemic and the related daily sharing of information on people’s mental health, providing dynamic knowledge to inform relevant health policies. Knowing in advance or in real time which type of information—as new daily cases, new daily deaths, or the phase of the pandemic—could have an impact and how it impacts emotions, analytical thinking, and the mental health of a population could allow the implementation of ad hoc and concrete responses. As a pandemic is constantly and heavily affecting our daily lives and mental health [, , ], we think that monitoring psychological health and intervening to prevent costly consequences or improve well-being with tailored psychological interventions are essential.
Future studies are needed to approach this active surveillance approach as a useful and concrete instrument for institutions and health policy.
Moreover, as our study contributes to the growing field of infodemiology in the pandemic context, further research could expand our research questions, analyzing and controlling for other factors that could influence word use in tweets in this pandemic period, such as geotagging, days, and seasonal weather [, ], as well as age, gender, working status, and other sociodemographic and spatial-temporal characteristics.
An increasing amount of literature has demonstrated the vast effects this pandemic is having on mental health, emotions, and cognition of the global and Italian populations. However, to the best of our knowledge, this is the first study analyzing psycholinguistic summary variables and their relationships with official COVID-19 Italian data and phases of managing the pandemic in a sample of Italian tweets during the first phases of the pandemic.
Our results show a powerful picture of the effects of COVID-19–related data and phases on emotions, analytical thinking, and somatosensory processes of Italian Twitter users: Specifically, when there was an increase in daily deaths and daily new cases, negative emotions and somatosensory words, often linked to traumatic events and PTSD symptoms, increased too. Moreover, emotional tone and analytic thinking were lower in the first phase of the pandemic, which was characterized by uncertainty, and increased during the first and second phases. As new instruments are implemented to monitor patients’ psychological status , having information on how the pandemic may affect the use of words with its relationships with psychosocial variables could be useful for institutions and health policies to develop specific interventions in order to mitigate the effects of this or future situations on the population’s mental health. Even if more studies are necessary, our results showed the feasibility and importance of infodemiological indicators and psycholinguistic analysis to monitor mental health–related variables in these unprecedented situations.
This work was partially supported by the Italian Ministry of Health with Ricerca Corrente and 5x1000 funds for IEO European Institute of Oncology IRCCS.
LV and GM are PhD students within the European School of Molecular Medicine (SEMM).
DM and LV planned the study, constructed the dataset, performed statistical analysis, and drafted the manuscript. GM and SFMP drafted the manuscript. GP supervised all the processes, provided critical guidance, and revised the manuscript. All authors contributed to the article and approved the submitted version.
Conflicts of Interest
Final dataset.XLSX File (Microsoft Excel File), 15 KB
Jamovi script.DOCX File , 13 KB
- Sehl K. Top Twitter Demographics That Matter to Social Media Marketers. Hootsuite. 2020 May 28. URL: https://blog.hootsuite.com/twitter-demographics/ [accessed 2021-09-14]
- Twitter Announces First Quarter 2020 Results. Twitter. 2020 Apr 30. URL: https://s22.q4cdn.com/826641620/files/doc_financials/2020/q1/Q1-2020-Earnings-Press-Release.pdf [accessed 2021-09-14]
- @TwitterIR. Q1 2020 Letter to Shareholders. URL: http://q4live.s22.clientfiles.s3-website-us-east-1.amazonaws.com/826641620/files/doc_financials/2020/q1/Q1-2020-Shareholder-Letter.pdf [accessed 2021-10-10]
- Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res 2009 Mar 27;11(1):e11 [FREE Full text] [CrossRef] [Medline]
- Sinnenberg L, Buttenheim AM, Padrez K, Mancheno C, Ungar L, Merchant RM. Twitter as a Tool for Health Research: A Systematic Review. Am J Public Health 2017 Jan;107(1):e1-e8. [CrossRef] [Medline]
- Lwin MO, Lu J, Sheldenkar A, Schulz PJ, Shin W, Gupta R, et al. Global Sentiments Surrounding the COVID-19 Pandemic on Twitter: Analysis of Twitter Trends. JMIR Public Health Surveill 2020 May 22;6(2):e19447 [FREE Full text] [CrossRef] [Medline]
- Xue J, Chen J, Hu R, Chen C, Zheng C, Su Y, et al. Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach. J Med Internet Res 2020 Nov 25;22(11):e20550 [FREE Full text] [CrossRef] [Medline]
- Su Y, Xue J, Liu X, Wu P, Chen J, Chen C, et al. Examining the Impact of COVID-19 Lockdown in Wuhan and Lombardy: A Psycholinguistic Analysis on Weibo and Twitter. Int J Environ Res Public Health 2020 Jun 24;17(12):1 [FREE Full text] [CrossRef] [Medline]
- Tausczik YR, Pennebaker JW. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology 2009 Dec 08;29(1):24-54. [CrossRef]
- Cohn MA, Mehl MR, Pennebaker JW. Linguistic markers of psychological change surrounding September 11, 2001. Psychol Sci 2004 Oct 01;15(10):687-693. [CrossRef] [Medline]
- Masiero M, Mazzocco K, Harnois C, Cropley M, Pravettoni G. From Individual To Social Trauma: Sources Of Everyday Trauma In Italy, The US And UK During The Covid-19 Pandemic. J Trauma Dissociation 2020 Jul 12;21(5):513-519. [CrossRef] [Medline]
- Marton G, Vergani L, Mazzocco K, Garassino M, Pravettoni G. 2020s Heroes Are Not Fearless: The Impact of the COVID-19 Pandemic on Wellbeing and Emotions of Italian Health Care Workers During Italy Phase 1. Front Psychol 2020;11:588762 [FREE Full text] [CrossRef] [Medline]
- Monzani D, Gorini A, Mazzoni D, Pravettoni G. Brief report - "Every little thing gonna be all right" (at least for me): Dispositional optimists display higher optimistic bias for infection during the Italian COVID-19 outbreak. Pers Individ Dif 2021 Jan 01;168:110388 [FREE Full text] [CrossRef] [Medline]
- Fredrickson BL, Tugade MM, Waugh CE, Larkin GR. What good are positive emotions in crisis? A prospective study of resilience and emotions following the terrorist attacks on the United States on September 11th, 2001. Journal of Personality and Social Psychology 2003;84(2):365-376. [CrossRef]
- Interpreting LIWC Output. Pennebaker Conglomerates. URL: https://liwc.wpengine.com/interpreting-liwc-output/ [accessed 2021-09-14]
- Crespo M, Fernández-Lansac V. Memory and narrative of traumatic events: A literature review. Psychol Trauma 2016 Mar;8(2):149-156. [CrossRef] [Medline]
- Beaudreau SA. Are trauma narratives unique and do they predict psychological adjustment? J Trauma Stress 2007 Jun;20(3):353-357. [CrossRef] [Medline]
- Jones C, Harvey AG, Brewin CR. The organisation and content of trauma memories in survivors of road traffic accidents. Behav Res Ther 2007 Jan;45(1):151-162. [CrossRef] [Medline]
- Buck N, Kindt M, van den Hout M, Steens L, Linders C. Perceptual Memory Representations and Memory Fragmentation as Predictors of Post-Trauma Symptoms. Behav. Cogn. Psychother 2006 Nov 15;35(3):259-272. [CrossRef]
- Banda J, Tekumalla R, Wang G, Yu J, Liu T, Ding Y, et al. A large-scale COVID-19 Twitter chatter dataset for open scientific research -- an international collaboration. Cornell University. URL: http://arxiv.org/abs/2004.03688 [accessed 2021-09-14]
- Arafat M. A Review of Models for Hydrating Large-scale Twitter Data of COVID-19-related Tweets for Transportation Research. Advance: a SAGE preprints community. 2020 Apr 27. URL: https://advance.sagepub.com/articles/preprint/A_Review_of_Models_for_Hydrating_Large-scale_Twitter_Data_of_COVID-19-related_Tweets_for_Transportation_Research/12192693 [accessed 2021-09-14]
- Hadi M, Xiao Y, Iqbal MS, Wang T, Arafat M, Hoque F, Florida International University Lehman Center for Transportation Research, Florida Department of Transportation Office of Research and Development. Estimation of System Performance and Technology Impacts to Support Future Year Planning. United States Department of Transportation. URL: https://rosap.ntl.bts.gov/view/dot/53966 [accessed 2021-09-14]
- Ess C, Jones S. Ethical decision-making and internet research: Recommendations from the AoIR ethics working committee. In: Buchanan E, editor. Readings in Virtual Research Ethics: Issues and Controversies. Hershey, PA: IGI Global; 2004:27-44.
- Agosti A, Rellini A. The Italian LIWC Dictionary. LIWC. 2007. URL: http://liwc.wpengine.com/ [accessed 2021-09-14]
- Pennebaker JW, Boyd RL, Jordan K, Blackburn K. The Development and Psychometric Properties of LIWC2015. University of Texas at Austin. 2015. URL: https://repositories.lib.utexas.edu/bitstream/handle/2152/31333/LIWC2015_LanguageManual.pdf [accessed 2021-09-14]
- Pennebaker JW, Chung CK, Frazee J, Lavergne GM, Beaver DI. When small words foretell academic success: the case of college admissions essays. PLoS One 2014;9(12):e115844 [FREE Full text] [CrossRef] [Medline]
- GAMLj: General Analyses for the Linear Model in Jamovi. GAMLj. URL: https://gamlj.github.io/ [accessed 2021-09-14]
- Jamovi. URL: https://www.jamovi.org/ [accessed 2021-09-14]
- Rozin P, Royzman EB. Negativity Bias, Negativity Dominance, and Contagion. Pers Soc Psychol Rev 2016 Dec 21;5(4):296-320. [CrossRef]
- Williams LM, Gatt JM, Schofield PR, Olivieri G, Peduto A, Gordon E. 'Negativity bias' in risk for depression and anxiety: brain-body fear circuitry correlates, 5-HTT-LPR and early life stress. Neuroimage 2009 Sep;47(3):804-814. [CrossRef] [Medline]
- Kimble M, Batterink L, Marks E, Ross C, Fleming K. Negative expectancies in posttraumatic stress disorder: neurophysiological (N400) and behavioral evidence. J Psychiatr Res 2012 Jul;46(7):849-855 [FREE Full text] [CrossRef] [Medline]
- Hayes JP, Vanelzakker MB, Shin LM. Emotion and cognition interactions in PTSD: a review of neurocognitive and neuroimaging studies. Front Integr Neurosci 2012;6:89 [FREE Full text] [CrossRef] [Medline]
- Diagnostic and Statistical Manual of Mental Disorders, 5th ed. Washington, DC: American Psychiatric Association; 2013.
- Boals A, Klein K. Word Use in Emotional Narratives about Failed Romantic Relationships and Subsequent Mental Health. Journal of Language and Social Psychology 2016 Jul 26;24(3):252-268. [CrossRef]
- Pennebaker JW, Francis ME. Cognitive, Emotional, and Language Processes in Disclosure. Cognition and Emotion 2010 Sep 10;10(6):601-626. [CrossRef]
- Pennebaker JW, Mayne TJ, Francis ME. Linguistic predictors of adaptive bereavement. Journal of Personality and Social Psychology 1997;72(4):863-871. [CrossRef]
- Follmer Greenhoot A, Sun S, Bunnell SL, Lindboe K. Making sense of traumatic memories: memory qualities and psychological symptoms in emerging adults with and without abuse histories. Memory 2013 Jan;21(1):125-142. [CrossRef] [Medline]
- Goldberg DP, Gater R, Sartorius N, Ustun TB, Piccinelli M, Gureje O, et al. The validity of two versions of the GHQ in the WHO study of mental illness in general health care. Psychol Med 1997 Jan 01;27(1):191-197. [CrossRef] [Medline]
- D'Andrea W, Chiu PH, Casas BR, Deldin P. Linguistic Predictors of Post-Traumatic Stress Disorder Symptoms Following 11 September 2001. Appl. Cognit. Psychol 2011 Oct 24;26(2):316-323. [CrossRef]
- Jelinek L, Randjbar S, Seifert D, Kellner M, Moritz S. The organization of autobiographical and nonautobiographical memory in posttraumatic stress disorder (PTSD). J Abnorm Psychol 2009 May;118(2):288-298. [CrossRef] [Medline]
- Aitken C, Mavridis D. Reasoning under uncertainty. Evid Based Ment Health 2019 Feb 24;22(1):44-48. [CrossRef] [Medline]
- Sadiković S, Branovački B, Oljača M, Mitrović D, Pajić D, Smederevac S. Daily Monitoring of Emotional Responses to the Coronavirus Pandemic in Serbia: A Citizen Science Approach. Front Psychol 2020 Aug 19;11:2133 [FREE Full text] [CrossRef] [Medline]
- Sheard J, Johnsen BH, Saus ER. Trauma narratives and emotional processing. Scand J Psychol 2005 Dec;46(6):503-510. [CrossRef] [Medline]
- Gore RJ, Diallo S, Padilla J. You Are What You Tweet: Connecting the Geographic Variation in America's Obesity Rate to Twitter Content. PLoS One 2015;10(9):e0133505 [FREE Full text] [CrossRef] [Medline]
- Padilla JJ, Kavak H, Lynch CJ, Gore RJ, Diallo SY. Temporal and spatiotemporal investigation of tourist attraction visit sentiment on Twitter. PLoS One 2018 Jun 14;13(6):e0198857 [FREE Full text] [CrossRef] [Medline]
- Kondylakis H, Bucur A, Dong F, Renzi C, Manfrinati A, Graf N, et al. iManageCancer: Developing a platform for empowering patients and strengthening self-management in cancer diseases. 2017 Presented at: 30th IEEE Int Symp on Computer-Based Medical Systems (CBMS); June 22-24, 2017; Thessaloniki, Greece p. 2017. [CrossRef]
|LIWC: Linguistic Inquiry and Word Count|
|PTSD: posttraumatic stress disorder|
Edited by C Basch; submitted 21.04.21; peer-reviewed by R Gore, J Chen, B Green; comments to author 11.05.21; revised version received 30.06.21; accepted 16.07.21; published 27.10.21Copyright
©Dario Monzani, Laura Vergani, Silvia Francesca Maria Pizzoli, Giulia Marton, Gabriella Pravettoni. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 27.10.2021.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.