Targeting stopwords for quality assurance of SNOMED-CT

Int J Med Inform. 2022 Nov:167:104870. doi: 10.1016/j.ijmedinf.2022.104870. Epub 2022 Sep 17.

Abstract

Objective: We assess the potential of exploiting stopwords in biomedical concept names to complete the logical definitions of concepts that are not sufficiently defined.

Methods: Concepts containing stopwords are selected from the Disorder hierarchy of Systematized NOmenclature of MEDicine (SNOMED-CT). SNOMED-CT consists of two types of concepts: Fully Defined (FD) concepts which are sufficiently defined and Partially Defined (PD) concepts which are not sufficiently defined. In this work, FD concepts containing stopwords are treated as a source of ground truth to complete the definitions of, lexically and semantically similar, PD concepts. FD and PD concepts are lexically and semantically analysed to create sample-sets. Mandatory attribute-relationships are calculated by using an intersection-set logic for each FD sample-set. PD sample-sets are audited against this mandatory attribute-relationship template to identify inconsistencies in modelling styles and potentially missing attribute-relationships.

Results: Lexical and semantic patterns around 11 stopwords were analysed. 26 sample-sets were extracted for the 11 stopwords. Mandatory attribute-relationships were identified for 24 of the 26 sample-sets. The method identified 62.5% - 72.22% of the PD concepts, containing the stopwords in and due to, to be inconsistent in their modelling style and potentially missing at least one attribute-relationship according to the created template.

Keywords: Biomedical Named Entity Recognition; Biomedical Ontologies; Lexical Auditing; Quality Assurance; SNOMED-CT; Semantic Analysis.

MeSH terms

  • Humans
  • Semantics*
  • Systematized Nomenclature of Medicine*