Exploration of text matching methods in Chinese disease Q&A systems: A method using ensemble based on BERT and boosted tree models

J Biomed Inform. 2021 Mar:115:103683. doi: 10.1016/j.jbi.2021.103683. Epub 2021 Jan 20.

Abstract

Background: Text matching is one of the basic tasks in the field of natural language processing. Owing to the particularity of Chinese language and medical texts, text matching has greater application and research value in the medical field. In 2019, at the China Health Information Processing Conference (CHIP), 30,000 sets of real disease Q&A data in Chinese on diabetes, hypertension, hepatitis B, AIDS, and breast cancer were released for public evaluation. A total of 90 teams participated in the evaluation.

Purpose: To explore the best method of text matching of Chinese medical Q&A data by participating in an evaluation competition.

Method: After analyzing the Chinese medical Q&A data provided by the competition, we used the bidirectional encoder representations from transformers (BERT) model and a boosted tree model to compare the effects. At the same time, we analyzed the importance of the features extracted through feature engineering. Finally, we integrated the BERT and boosted tree models, and proved the effectiveness of the ensemble through a correlation analysis.

Results: The final F1 score of the ensemble model is 0.90825, ranking first among the 90 participating teams. The highest F1 score of the single BERT model is 0.87443, whereas the highest F1 score of the boosted tree single model is only 0.86915. The F1 score of the BERT multi-model ensemble is 0.87473 (an average increase of 0.756% compared to the single model), and the F1 score of the boosted tree multi-model ensemble is 0.86720 (an average decrease of 0.03% compared to the single model). In the feature importance experiment, the out-degree and in-degree of the Q&A sentence are of utmost importance. In the correlation experiment, the correlation coefficients between models of the same type are all as high as 0.9, which shows a high similarity. The correlation coefficient between different types of models is approximately 0.7, which shows a certain degree of discrimination. With the ensemble of the two types of models, the F1 score reached 0.90825, which is 3.88% higher than that of the optimal single model.

Conclusion: In our study, the proposed model ensemble method was shown to effectively improve the performance of a single model. It achieves good results in Chinese medical Q&A tasks and has a good generalization property.

Keywords: BERT; Boosted tree model; Feature engineering; Text matching.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China
  • Language*
  • Natural Language Processing*