Despite an abundance of information in clinical genetic testing reports, information is oftentimes not well documented/utilized for decision making. Unstructured information in genetic reports can contribute to long-term patient management and future translational research. Thus, we proposed a knowledge model that could manage unstructured information in medical genetic reports and facilitate knowledge extraction, curation and updating. For this pilot study, we used a dataset including 1,565 cancer genetics reports of Mayo Clinic patients. We used a previously developed, data-driven discovery pipeline that involves both semantic annotation and co-occurrence association analysis to establish a knowledge model. We showed that compared to genetic reports, around 56% of testing results are missing or incomplete in the clinical notes. We built a genetic report knowledge model and highlighted four key semantic groups including "Genes and Gene Products" and "Treatments". Coverage of term annotation was 99.5%. Accuracies of term annotation and relationship extraction were 98.9% and 92.9% respectively.
©2020 AMIA - All rights reserved.