CIDER: Context-sensitive polarity measurement for short-form text

PLoS One. 2024 Apr 18;19(4):e0299490. doi: 10.1371/journal.pone.0299490. eCollection 2024.

Abstract

Researchers commonly perform sentiment analysis on large collections of short texts like tweets, Reddit posts or newspaper headlines that are all focused on a specific topic, theme or event. Usually, general-purpose sentiment analysis methods are used. These perform well on average but miss the variation in meaning that happens across different contexts, for example, the word "active" has a very different intention and valence in the phrase "active lifestyle" versus "active volcano". This work presents a new approach, CIDER (Context Informed Dictionary and sEmantic Reasoner), which performs context-sensitive linguistic analysis, where the valence of sentiment-laden terms is inferred from the whole corpus before being used to score the individual texts. In this paper, we detail the CIDER algorithm and demonstrate that it outperforms state-of-the-art generalist unsupervised sentiment analysis techniques on a large collection of tweets about the weather. CIDER is also applicable to alternative (non-sentiment) linguistic scales. A case study on gender in the UK is presented, with the identification of highly gendered and sentiment-laden days. We have made our implementation of CIDER available as a Python package: https://pypi.org/project/ciderpolarity/.

MeSH terms

  • Algorithms
  • Gender Identity
  • Semantics
  • Sentiment Analysis
  • Social Media*

Grants and funding

H.T.P.W. acknowledges funding from UK Natural Environment Research Council (NE/P017436/1). J.C.Y. is funded by a PhD studentship from the UK Engineering and Physical Sciences Research Council. No funding bodies had any influence over the content of this report.