Comparing medical code usage with the compression-based dissimilarity measure

Stud Health Technol Inform. 2007;129(Pt 1):684-8.

Abstract

It is well known that medical coding practice is inconsistent and that differences in usage may exist even at the institutional level. In this paper we introduce a novel method for investigating code usage patterns in clinical documentation corpora. By applying the Compression-based Dissimilarity Measure to calculate similarities between encounter notes, we find that certain notes can be associated with a number of different classifications and that a given classification code can be documented in fundamentally different ways. The effect is that some notes need to be understood in the context of the classification code, a finding which has implications for data mining or information extraction tasks. In addition, the method opens for a number of interesting application areas that include highlighting code use anomalies, measuring how coding practice changes over time, comparing code usage across institutions, and, perhaps most importantly, provide valuable feedback to developers of classification coding systems.

Publication types

  • Comparative Study

MeSH terms

  • Forms and Records Control
  • Humans
  • Mathematics
  • Medical Records Systems, Computerized / classification*
  • Primary Health Care / classification*
  • Vocabulary, Controlled*