Objective: To develop a method to exploit the UMLS Metathesaurus for extracting and categorizing concepts found in clinical text representing signs and symptoms to anatomically related organ systems. The overarching goal is to classify patient reported symptoms to organ systems for population health and epidemiological analyses.
Materials and methods: Using the concepts' semantic types and the inter-concept relationships as guidance, a selective portion of the concepts within the UMLS Metathesaurus was traversed starting from the concepts representing the highest level organ systems. The traversed concepts were chosen, filtered, and reviewed to obtain the concepts representing clinical signs and symptoms by blocking deviations, pruning superfluous concepts, and manual review. The mapping process was applied to signs and symptoms annotated in a corpus of 750 clinical notes.
Results: The mapping process yielded a total of 91,000 UMLS concepts (with approximately 300,000 descriptions) possibly representing physical and mental signs and symptoms that were extracted and categorized to the anatomically related organ systems. Of 1864 distinct descriptions of signs and symptoms found in the 750 document corpus, 1635 of these (88%) were successfully mapped to the set of concepts extracted from the UMLS. Of 668 unique concepts mapped, 603 (90%) were correctly categorized to their organ systems.
Conclusion: We present a process that facilitates mapping of signs and symptoms to their organ systems. By providing a smaller set of UMLS concepts to use for comparing and matching patient records, this method has the potential to increase efficiency of information extraction pipelines.
Keywords: Information extraction; Semantic mapping; Symptoms; UMLS Metathesaurus.
Published by Elsevier Inc.