An Evaluation of Geo-located Twitter Data for Measuring Human Migration

Int J Geogr Inf Sci. 2022;36(9):1830-1852. doi: 10.1080/13658816.2022.2075878. Epub 2022 Jun 15.

Abstract

This study evaluates the spatial patterns of flows generated from geo-located Twitter data to measure human migration. Using geo-located tweets continuously collected in the U.S. from 2013 to 2015, we identified Twitter users who migrated per changes in county-of-residence every two years and compared the Twitter-estimated county-to-county migration flows with the ones from the U.S. Internal Revenue Service (IRS). To evaluate the spatial patterns of Twitter migration flows when representing the IRS counterparts, we developed a normalized difference representation index to visualize and identify those counties of over-/under-representations in the Twitter estimates. Further, we applied a multidimensional spatial scan statistic approach based on a Poisson process model to detect pairs of origin and destination regions where the over-/under-representativeness occurred. The results suggest that Twitter migration flows tend to under-represent the IRS estimates in regions with a large population and over-represent them in metropolitan regions adjacent to tourist attractions. This study demonstrated that geo-located Twitter data could be a sound statistical proxy for measuring human migration. Given that the spatial patterns of Twitter-estimated migration flows vary significantly across the geographic space, related studies will benefit from our approach by identifying those regions where data calibration is necessary.

Keywords: geo-located Twitter data; human migration; migration flows; multidimensional Poisson process; spatial scan statistics.