While natural language processing affords researchers an opportunity to automatically scan millions of social media posts, there is growing concern that automated computational tools lack the ability to understand context and nuance in human communication and language. This article introduces a critical systematic approach for extracting culture, context and nuance in social media data. The Contextual Analysis of Social Media (CASM) approach considers and critiques the gap between inadequacies in natural language processing tools and differences in geographic, cultural, and age-related variance of social media use and communication. CASM utilizes a team-based approach to analysis of social media data, explicitly informed by community expertise. We use of CASM to analyze Twitter posts from gang-involved youth in Chicago. We designed a set of experiments to evaluate the performance of a support vector machine using CASM hand-labeled posts against a distant model. We found that the CASM-informed hand-labeled data outperforms the baseline distant labels, indicating that the CASM labels capture additional dimensions of information that content-only methods lack. We then question whether this is helpful or harmful for gun violence prevention.
Keywords: NLP; ethics; qualitative analysis; social science.