Joint extraction of entity and relation based on fine-tuning BERT for long biomedical literatures

Ting Gao; Xue Zhai; Chuan Yang; Linlin Lv; Han Wang

doi:10.1093/bioadv/vbae194

Joint extraction of entity and relation based on fine-tuning BERT for long biomedical literatures

Bioinform Adv. 2024 Dec 5;4(1):vbae194. doi: 10.1093/bioadv/vbae194. eCollection 2024.

Authors

Ting Gao¹, Xue Zhai¹, Chuan Yang^{1

2}, Linlin Lv¹, Han Wang^{1

2}

Affiliations

¹ School of Information Science and Technology, Northeast Normal University, Changchun 130117, China.
² Institute of Computational Biology, Northeast Normal University, Changchun 130117, China.

Abstract

Motivation: Joint extraction of entity and relation is an important research direction in Information Extraction. The number of scientific and technological biomedical literature is rapidly increasing, so automatically extracting entities and their relations from these literatures are key tasks to promote the progress of biomedical research.

Results: The joint extraction of entity and relation model achieves both intra-sentence extraction and cross-sentence extraction, alleviating the problem of long-distance information dependence in long literature. Joint extraction of entity and relation model incorporates a variety of advanced deep learning techniques in this paper: (i) a fine-tuning BERT text classification pre-training model, (ii) Graph Convolutional Network learning method, (iii) Robust Learning Against Textual Label Noise with Self-Mixup Training, (iv) Local regularization Conditional Random Fields. The model implements the following functions: identifying entities from complex biomedical literature effectively, extracting triples within and across sentences, reducing the effect of noisy data during training, and improving the robustness and accuracy of the model. The experiment results prove that the model performs well on the self-built BM_GBD dataset and public datasets, enabling precise large language model enhanced knowledge graph construction for biomedical tasks.

Availability and implementation: The model and partial code are available on GitHub at https://github.com/zhaix922/Joint-extraction-of-entity-and-relation.