Mining for associations between categorical data items in a clinical data repository

AMIA Annu Symp Proc. 2007 Oct 11:945.

Abstract

We present here our preliminary work in using simple two-way categorical tests to discover associations between categorical items in a clinical data repository. Initial results using the chi square test yielded diagnosis code associations that seemed plausible as well as several that did not. This may be due in part to the effect of sample size. Tests more resistant to the effects of sample size may yield a higher fraction of plausible diagnosis code associations.

MeSH terms

  • Chi-Square Distribution
  • Humans
  • Information Storage and Retrieval*
  • International Classification of Diseases*
  • Registries*