Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use

J Biomed Inform. 2015 Dec:58:280-287. doi: 10.1016/j.jbi.2015.11.004. Epub 2015 Nov 7.

Abstract

Self-reported patient data has been shown to be a valuable knowledge source for post-market pharmacovigilance. In this paper we propose using the popular micro-blogging service Twitter to gather evidence about adverse drug reactions (ADRs) after firstly having identified micro-blog messages (also know as "tweets") that report first-hand experience. In order to achieve this goal we explore machine learning with data crowdsourced from laymen annotators. With the help of lay annotators recruited from CrowdFlower we manually annotated 1548 tweets containing keywords related to two kinds of drugs: SSRIs (eg. Paroxetine), and cognitive enhancers (eg. Ritalin). Our results show that inter-annotator agreement (Fleiss' kappa) for crowdsourcing ranks in moderate agreement with a pair of experienced annotators (Spearman's Rho=0.471). We utilized the gold standard annotations from CrowdFlower for automatically training a range of supervised machine learning models to recognize first-hand experience. F-Score values are reported for 6 of these techniques with the Bayesian Generalized Linear Model being the best (F-Score=0.64 and Informedness=0.43) when combined with a selected set of features obtained by using information gain criteria.

Keywords: Crowdsourcing; Natural language processing; Pharmacovigilance; Twitter.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Crowdsourcing*
  • Drug Prescriptions*
  • Humans
  • Social Media*