Identifying Medical Concepts and Semantic Types in Lay Vocabularies of Health Consumers Who are Concerned with Diabetes on Social Media Using the UMLS and NLP

Proc COMPSAC. 2024 Jul:2024:862-869. doi: 10.1109/compsac61105.2024.00119. Epub 2024 Aug 26.

Abstract

This study suggests a way to utilize the existing medical ontology and natural language processing techniques to extract major medical concepts from lay vocabularies of health consumers on social media and group them based on the defined semantic types in the ontology. Diabetes-related discussions on Tumblr was used to test the efficiency of SpaCy and the Markov-Viterbi algorithm to map lay medical terms to the defined medical concepts in the UMLS. The system discussed in this paper can better analyze free texts, take care of word ambiguity and extract the lifestyle indicators from the daily life discussions of diabetic people on Tumblr. The findings of this study can contribute to developing health applications that track the health behavior of those living with chronic conditions such as diabetes. This approach can also assist researchers who are interested in processing lay languages used by health consumers to foster an understanding of their health behavior.

Keywords: Chronic Diseases; Diabetes; Lifestyle Indicators; Natural Language Processing (NLP); Unified Medical Language System (UMLS).