Developing a section labeler for clinical documents

Peter J Haug; Xinzi Wu; Jeffery P Ferraro; Guergana K Savova; Stanley M Huff; Christopher G Chute

Developing a section labeler for clinical documents

AMIA Annu Symp Proc. 2014 Nov 14:2014:636-44. eCollection 2014.

Authors

Peter J Haug¹, Xinzi Wu², Jeffery P Ferraro¹, Guergana K Savova³, Stanley M Huff¹, Christopher G Chute⁴

Affiliations

¹ Intermountain Healthcare, Salt Lake City, UT ; University of Utah, Salt Lake City, UT.
² Intermountain Healthcare, Salt Lake City, UT.
³ Boston Children's Hospital and Harvard Medical School, Boston, MA.
⁴ Mayo Clinic, Rochester, MN.

PMID: 25954369
PMCID: PMC4419880

Abstract

Natural language processing (NLP) technologies provide an opportunity to extract key patient data from free text documents within the electronic health record (EHR). We are developing a series of components from which to construct NLP pipelines. These pipelines typically begin with a component whose goal is to label sections within medical documents with codes indicating the anticipated semantics of their content. This Clinical Section Labeler prepares the document for further, focused information extraction. Below we describe the evaluation of six algorithms designed for use in a Clinical Section Labeler. These algorithms are trained with N-gram-based feature sets extracted from document sections and the document types. In the evaluation, 6 different Bayesian models were trained and used to assign one of 27 different topics to each section. A tree-augmented Bayesian network using the document type and N-grams derived from section headers proved most accurate in assigning individual sections appropriate section topics.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms*
Bayes Theorem
Electronic Health Records* / classification
Information Storage and Retrieval
Natural Language Processing*
Semantics

Abstract

Publication types

MeSH terms

Grants and funding