EHR Text Categorization for Enhanced Patient-Based Document Navigation

Markus Kreuzthaler; Bastian Pfeifer; José Antonio Vera Ramos; Diether Kramer; Victor Grogger; Sylvia Bredenfeldt; Markus Pedevilla; Peter Krisper; Stefan Schulz

EHR Text Categorization for Enhanced Patient-Based Document Navigation

Stud Health Technol Inform. 2018:248:100-107.

Authors

Markus Kreuzthaler¹, Bastian Pfeifer¹, José Antonio Vera Ramos¹, Diether Kramer², Victor Grogger², Sylvia Bredenfeldt², Markus Pedevilla², Peter Krisper³, Stefan Schulz¹

Affiliations

¹ Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.
² KAGes Steiermärkische Krankenanstaltengesellschaft m.b.H., Graz, Austria.
³ Division of Nephrology and Dialysis, Department of Internal Medicine, Medical University of Graz, Austria.

PMID: 29726425

Abstract

Patients with multiple disorders usually have long diagnosis lists, constitute by ICD-10 codes together with individual free-text descriptions. These text snippets are produced by overwriting standardized ICD-Code topics by the physicians at the point of care. They provide highly compact expert descriptions within a 50-character long text field frequently not assigned to a specific ICD-10 code. The high redundancy of these lists would benefit from content-based categorization within different hospital-based application scenarios. This work demonstrates how to accurately group diagnosis lists via a combination of natural language processing and hierarchical clustering with an overall F-measure value of 0.87. In addition, it compresses the initial diagnosis list up to 89%. The manuscript discusses pitfall and challenges as well as the potential of a large-scale approach for tackling this problem.

Keywords: Cluster Analysis; Electronic Health Records; Natural Language Processing; Semantics.

MeSH terms

Electronic Health Records*
Humans
International Classification of Diseases*
Natural Language Processing*