Developing Methodologies to Find Abbreviated Laboratory Test Names in Narrative Clinical Documents by Generating High Quality Q-Grams

Stud Health Technol Inform. 2017:245:452-456.

Abstract

Laboratory test names are used as basic information to diagnose diseases. However, this kind of medical information is usually written in a natural language. To find this information, lexicon based methods have been good solutions but they cannot find terms that do not have abbreviated expressions, such as "neuts" that means "neutrophils". To address this issue, similar word matching can be used; however, it can be disadvantageous because of significant false positives. Moreover, processing time is longer as the size of terms is bigger. Therefore, we suggest a novel q-gram based algorithm, named modified triangular area filtering, to find abbreviated laboratory test terms in clinical documents, minimizing the possibility to impair the lexicons' precision. In addition, we found the terms using the methodology with reasonable processing time. The results show that this method can achieve 92.54 precision, 87.72 recall, 90.06 f1-score in test sets when edit distance threshold(τ) = 3.

Keywords: Medical Informatics; Medical Informatics Computing; Natural Language Processing.

MeSH terms

  • Algorithms*
  • Humans
  • Language
  • Narration*
  • Natural Language Processing*
  • Unified Medical Language System*