The Twitter Social Mobility Index: Measuring Social Distancing Practices from Geolocated Tweets

Social distancing is an important component of the response to the novel Coronavirus (COVID-19) pandemic. Minimizing social interactions and travel reduces the rate at which the infection spreads, and"flattens the curve"such that the medical system can better treat infected individuals. However, it remains unclear how the public will respond to these policies. This paper presents the Twitter Social Mobility Index, a measure of social distancing and travel derived from Twitter data. We use public geolocated Twitter data to measure how much a user travels in a given week. We find a large reduction in travel in the United States after the implementation of social distancing policies, with larger reductions in states that were early adopters and smaller changes in states without policies. Our findings are presented on http://socialmobility.covid19dataresources.org and we will continue to update our analysis during the pandemic.


Introduction
The outbreak of the SARS-CoV-2 virus, a Coronavirus that causes the disease COVID-19, has caused a pandemic on a scale unseen in a generation. Without an available vaccine to reduce transmission of the virus, public health and elected officials have called on the public to practice social distancing. Social distancing is a set of practices in which individuals maintain a physical distance so as to reduce the number of physical contacts they encounter (Maharaj and Kleczkowski, 2012;Kelso et al., 2009). These practices include maintaining a distance of at least six feet and avoiding large gatherings (Glass et al., 2006). At the time of this writing, in the United States nearly every state has implemented state-wide "stay-at-home" orders to enforce social distancing practices (Zeleny, 2020).
While an important tool in the fight against COVID-19, the implementation of social distancing by the general public can vary widely. While a state governor may issue an order for the practice, individuals in different states may respond in different ways. Understanding actual reductions in travel and social contacts is critical to measuring the effectiveness of the policy. These policies may remain in effect for an extended period of time. Thus, the public may begin to relax their practices, making additional policies necessary. Additionally, epidemiologists already model the impact of social distancing policies on the course of an outbreak (Prem et al., 2020;Fenichel et al., 2011;Caley et al., 2008). These models may be more effective when incorporating actual measures of social distancing, rather than assuming official policies are implemented in practice.
It can be challenging to obtain data on the efficacy of social distancing practices, especially during an ongoing pandemic. A recent Gallup poll surveyed Americans to find that many adults are taking precautions to keep their distance from others (Saad, 2020). However, while polling can provide insights, it cannot provide a solution. Polling is relatively expensive, making it a poor choice for ongoing population surveillance practices and providing data on specific geographic locales, i.e. US States and major cities (Dredze et al., 2016a). Additionally, polling around public health issues suffers from response bias, as individuals may overstate their compliance with established public health recommendations (Adams et al., 1999).
Over the past decade, analyses of social media and web data have been widely adopted to support public health objectives (Paul and Dredze, 2017). In this vein, several efforts have emerged over the past few weeks to track social distancing practices using these data sources. Google has released "COVID-19 Community Mobility Reports" which use Google data to "chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential" (Google, 2020). The Unacast "Social Distancing Scoreboard" uses data collected from 127 million monthly active users to measure the implementation of social distancing practices (Unacast, 2020). Researchers at the Institute for Disease Modeling have used data from Facebook's "Data for Good" program to model the decline in mobility in the Greater Seattle area and its effect on the spread of COVID-19 (Burstein et al., 2020). Using cell phone data, the New York Times completed an analysis that showed that stay-at-home orders dramatically reduced travel, but that states that have waited to enact such orders have continued to travel widely (Glanz et al., 2020). These efforts provide new and important opportunities to study social distancing in real-time.
We present the Twitter Social Mobility Index, a measure of social distancing and travel patterns derived from public Twitter data. We use public geolocated Twitter data to measure how much a user travels in a given week. We compute a metric based on the standard deviation of a user's geolocated tweets each week, and aggregate these data over an entire population to produce a metric for the United States as a whole, for individual states and for some US cities. We find that, taking the US as a whole, there has been a dramatic drop in travel in recent weeks, with travel between March 16 and March 29, 2020 showing the lowest amount of travel since January 1, 2019, the start of our dataset. Additionally, we find that travel reductions are not uniform across the United States, but vary from state to state. A key advantage of our approach is that, unlike other travel and social distancing analyses referenced above, we rely on entirely public data, enabling others to replicate our findings and explore different aspects of these data. Additionally, since Twitter contains user generated content in addition to location information, future analyses can correlate attitudes, beliefs, and behaviors with changes in social mobility.
Our findings are presented on http:// socialmobility.covid19dataresources.org and we will continue to update our analysis during the pandemic.

Data
Twitter offers several ways in which a user can indicate their location. If a user is tweeting from a GPS enabled device, they can attach their exact coordinate to that tweet. Twitter may then display to the user, and provide in their API, the specific place that corresponds to these coordinates. Alternatively, a user can explicitly select a location, which can be a point of interest (coffee shop), a neighborhood, a city, state, or country. If the tweet is public, this geolocation information is supplied with the tweet.
We used the Twitter streaming API 1 to download tweets based on location. We used a bounding box that covered the entire United States, including territories. We used data from this collection starting on January 1, 2019 and ending on March 29, 2020. In total, this included 4,600,287 Twitter users and 544,546,651 tweets.

Location Data
We process the two types of geolocation information described in the previous section.
Coordinates The exact coordinates (latitude/longitude) provided by the user ("coordinates" field in the Twitter JSON object). About 8.02% of our data included "coordinates".
Place The "place" field in the Twitter json object indicates a known location in which the tweet was authored. A place can be a point of interest (a specific hotel), a neighborhood ("Downtown Jacksonville"), a city ("Kokomo, IN"), a state ("Arizona") or a country ("United States"). The place object contains a unique ID, a bounding box, the country and a name. More information about the location is available from the Twitter GEO API. A place is available with a tweet in either of two conditions. First, Twitter identifies the coordinates provided by the user as occurring in a known place. Second, if the user manually selects the place when authoring the tweet.
Since coordinates give a more precise location, we use them instead of place when available. If we only have a place, we assume that the user is in the center of the place, as given by the place's bounding box.
For points of interest and neighborhoods, Twitter only provides the country in the associated metadata. While in some cases the city can be parsed from the name, and the state inferred, we opted to exclude these places from our analysis. The full location details can be obtained from querying the Twitter API, but the magnitude of data in our analysis made this too time consuming. This excluded about 1.8% of our data.
We include an analysis of New York City. For this analysis, we include places that corresponded to each of the five New York City boroughs (Brooklyn, Manhattan, Queens, Staten Island, The Bronx). We also included points of interest that had in the name the words "New York City", e.g. "New York City Center".
In summary, for each geolocated tweet we have an associated latitude and longitude.

Computing Mobility
We define the Twitter Social Mobility Index as follows. For each user, we collect all locations (coordinates) in a one week period, where a week starts on Monday and ends the following Sunday. We compute the centroid of all of the the coordinates and consider this the "home" location for the user for that week. We then measure the distance between each location and the centroid for that week. For distance, we measure the geodesic distance in kilometers between two adjacent records using geopy 2 . After collecting the distances we measure the standard deviation of these distances. In summary, this measure reflects the area of travel for a user, rather than the raw distance traveled. Therefore, a user who takes a long trip with a small number of checkins would have a larger social mobility measure than a user with many checkins who traveled in a small area.
We aggregate the results by week by taking the mean measure of all users in a given geographic area. We also record the variance of these measures to study the travel variance in the population, which will indicate if some users are not reducing travel.
We produce aggregate scores by geopgraphic area for the United State as a whole, for each US state and territory, and for New York City. We determine the geographic area of a user based on their centroid location for that week. This allows users to move geographic areas by week.
We compute the social mobility index for each week between January 1, 2019 and March 29, 2020. We select the date of March 16, 2020 as the start of social distancing on the national level, though 2 https://github.com/geopy/geopy individual states have implemented practices at different times. Therefore, we divide the data into two time periods: before social distancing (January 1, 2019 -March 15, 2020) and after social distancing (March 16th, 2020 -March 29, 2020).
We then compute the reduction in social mobility as follows: Mobility Reduction = 1− mobility after social distancing mobility before social distancing . (1) To handle sparse data issues in our dataset, we exclude (1) users with less than 3 geolocated tweets overall, and (2) a weekly record for a user if that user has less than 2 geolocated tweets in that week. Additionally, due to data loss in our data collection process we remove two weeks with far less data than other weeks: January 20 -January 27, 2020 and February 17 -February 24, 2020. Table 1 shows the Twitter Social Mobility Index measured in kilometers for every state and territory in United States, New York City, and United States as a whole. We also include the rank of each state or territory.

Results
A few observations. The overall drop in mobility across the United States was large: 48.77%. Second, every US state and territory saw a drop in mobility, ranging from 21.54% to 63.13% travel compared to numbers before March 16, 2020. However, the variance by state was high. States that were early adopters of social distancing practices are ranked highly on the reduction in travel: e.g. Washington (3) and Maryland (5). In contrast, the eight states that do not currently have state wide orders (Zeleny, 2020) rank poorly: Arkansas (37), Iowa (50), Nebraska (29), North Dakota (13), South Carolina (34), South Dakota (39), Oklahoma (52), Utah (17), Wyoming (53). We measured two cities: New York City (51.16%) and Washington, DC (63.13%). The fact that Washington, DC, ranks first in Table 1 is consistent with the result in Unacast (2020), where DC has the highest grade (A-) as of date March 29, 2020. More work is needed to understand how to compare US states with cities. Figure 1 shows the weekly social mobility index for the United States for the entire time period of our dataset. The figure reflects a massive drop in mobility starting in March, with the two most recent weeks the lowest on record in our dataset. Finally, Figure 2 shows a box-plot of the mobility variance across all user in a given time period. The distribution is long-tailed and with a lot zeros, so we take the log of 1 plus each mobility index. While mobility is reduced in general, some users are still showing a lot of movement, suggesting that social distancing is not being uniformly practiced. These results clearly demonstrate that our metric can track drops in travel, suggesting that it can be used as part of ongoing pandemic response planning.

Related Work
There is a long line of work on geolocation prediction for Twitter, which requires inferring a location for a specific tweet or user (Dredze et al., 2013;Zheng et al., 2018;Han et al., 2014;Pavalanathan and Eisenstein, 2015). This includes work on patterns and trends in Twitter geotagged data (Dredze et al., 2016c). While most of this work focused on a user, and thus not suitable for tracking a user's movements, there may be opportunities to combine these methods with our approach.
There have been many studies that have analyzed Twitter geolocation data to study population movements. Hawelka et al. (2014) demonstrated a method for computing global travel patterns from Twitter, and Dredze et al. (2016b) adapted this method to support efforts in combating the Zika epidemic.

Conclusion
We presented the Twitter Social Mobility Index, a measure of social mobility based on public Twitter geolocated tweets. Our analysis shows that overall in the United States there has been a large drop in mobility. However, the drop is inconsistent and varies significantly by state. Anecdotally, states that were early adopters of social distancing practices have more significant drops than states that have not yet implemented these practices.
Our work on this data is ongoing, and there are several directions that warrant further study. First, our analysis does not incorporate state by state variations in when social distancing was implemented. Second, we only include two cities in our current analysis, but our approach can be extended to dozens of US metro areas. Third, we focused on the United States, but Twitter data provides sufficient coverage for many countries to replicate our analysis. Finally, for each user in the dataset there exists tweet content, that can reflect a user's attitudes, beliefs and behaviors. Studying these together with their mobility reduction could yield further insights.
Our findings are presented on http:// socialmobility.covid19dataresources.org and we will continue to update our analysis during the pandemic.