EHR problem list clustering for improved topic-space navigation

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):72. doi: 10.1186/s12911-019-0789-9.

Abstract

Background: The amount of patient-related information within clinical information systems accumulates over time, especially in cases where patients suffer from chronic diseases with many hospitalizations and consultations. The diagnosis or problem list is an important feature of the electronic health record, which provides a dynamic account of a patient's current illness and past history. In the case of an Austrian hospital network, problem list entries are limited to fifty characters and are potentially linked to ICD-10. The requirement of producing ICD codes at each hospital stay, together with the length limitation of list items leads to highly redundant problem lists, which conflicts with the physicians' need of getting a good overview of a patient in short time. This paper investigates a method, by which problem list items can be semantically grouped, in order to allow for fast navigation through patient-related topic spaces.

Methods: We applied a minimal language-dependent preprocessing strategy and mapped problem list entries as tf-idf weighted character 3-grams into a numerical vector space. Based on this representation we used the unweighted pair group method with arithmetic mean (UPGMA) clustering algorithm with cosine distances and inferred an optimal boundary in order to form semantically consistent topic spaces, taking into consideration different levels of dimensionality reduction via latent semantic analysis (LSA).

Results: With the proposed clustering approach, evaluated via an intra- and inter-patient scenario in combination with a natural language pipeline, we achieved an average compression rate of 80% of the initial list items forming consistent semantic topic spaces with an F-measure greater than 0.80 in both cases. The average number of identified topics in the intra-patient case (μIntra = 78.4) was slightly lower than in the inter-patient case (μInter = 83.4). LSA-based feature space reduction had no significant positive performance impact in our investigations.

Conclusions: The investigation presented here is centered on a data-driven solution to the known problem of information overload, which causes ineffective human-computer interactions at clinicians' work places. This problem is addressed by navigable disease topic spaces where related items are grouped and the topics can be more easily accessed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Austria
  • Cluster Analysis*
  • Data Management / methods*
  • Electronic Health Records*
  • Humans
  • International Classification of Diseases
  • Semantics
  • User-Computer Interface