Comparing medical code usage with the compression-based dissimilarity measure

Thomas Brox Røst; Ole Edsberg; Anders Grimsmo; Øystein Nytrø

Comparing medical code usage with the compression-based dissimilarity measure

Stud Health Technol Inform. 2007;129(Pt 1):684-8.

Authors

Thomas Brox Røst¹, Ole Edsberg, Anders Grimsmo, Øystein Nytrø

Affiliation

¹ Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway. brox@idi.ntnu.no

PMID: 17911804

Abstract

It is well known that medical coding practice is inconsistent and that differences in usage may exist even at the institutional level. In this paper we introduce a novel method for investigating code usage patterns in clinical documentation corpora. By applying the Compression-based Dissimilarity Measure to calculate similarities between encounter notes, we find that certain notes can be associated with a number of different classifications and that a given classification code can be documented in fundamentally different ways. The effect is that some notes need to be understood in the context of the classification code, a finding which has implications for data mining or information extraction tasks. In addition, the method opens for a number of interesting application areas that include highlighting code use anomalies, measuring how coding practice changes over time, comparing code usage across institutions, and, perhaps most importantly, provide valuable feedback to developers of classification coding systems.

Publication types

Comparative Study

MeSH terms

Forms and Records Control
Humans
Mathematics
Medical Records Systems, Computerized / classification*
Primary Health Care / classification*
Vocabulary, Controlled*