Knowledge enhanced LSTM for coreference resolution on biomedical texts

Bioinformatics. 2021 Sep 9;37(17):2699-2705. doi: 10.1093/bioinformatics/btab153.

Abstract

Motivation: Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events' attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information.

Results: In this article, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences.

Availability and implementation: The source code will be made available at https://github.com/zxy951005/KB-CR upon publication. Data is avaliable at http://2011.bionlp-st.org/ and https://github.com/UCDenver-ccp/CRAFT/releases/tag/v3.1.3.

Supplementary information: Supplementary data are available at Bioinformatics online.