Community annotation experiment for ground truth generation for the i2b2 medication challenge

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):519-23. doi: 10.1136/jamia.2010.004200.

Abstract

Objective: Within the context of the Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records, the authors (also referred to as 'the i2b2 medication challenge team' or 'the i2b2 team' for short) organized a community annotation experiment.

Design: For this experiment, the authors released annotation guidelines and a small set of annotated discharge summaries. They asked the participants of the Third i2b2 Workshop to annotate 10 discharge summaries per person; each discharge summary was annotated by two annotators from two different teams, and a third annotator from a third team resolved disagreements.

Measurements: In order to evaluate the reliability of the annotations thus produced, the authors measured community inter-annotator agreement and compared it with the inter-annotator agreement of expert annotators when both the community and the expert annotators generated ground truth based on pooled system outputs. For this purpose, the pool consisted of the three most densely populated automatic annotations of each record. The authors also compared the community inter-annotator agreement with expert inter-annotator agreement when the experts annotated raw records without using the pool. Finally, they measured the quality of the community ground truth by comparing it with the expert ground truth.

Results and conclusions: The authors found that the community annotators achieved comparable inter-annotator agreement to expert annotators, regardless of whether the experts annotated from the pool. Furthermore, the ground truth generated by the community obtained F-measures above 0.90 against the ground truth of the experts, indicating the value of the community as a source of high-quality ground truth even on intricate and domain-specific annotation tasks.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*
  • Patient Discharge
  • Pharmaceutical Preparations*

Substances

  • Pharmaceutical Preparations