Visual-linguistic Diagnostic Semantic Enhancement for medical report generation

Jiahong Chen; Guoheng Huang; Xiaochen Yuan; Guo Zhong; Zhe Tan; Chi-Man Pun; Qi Yang

doi:10.1016/j.jbi.2024.104764

Visual-linguistic Diagnostic Semantic Enhancement for medical report generation

J Biomed Inform. 2024 Dec 31:104764. doi: 10.1016/j.jbi.2024.104764. Online ahead of print.

Authors

Jiahong Chen¹, Guoheng Huang², Xiaochen Yuan³, Guo Zhong⁴, Zhe Tan¹, Chi-Man Pun⁵, Qi Yang⁶

Affiliations

¹ School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China.
² School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China. Electronic address: kevinwong@gdut.edu.cn.
³ Faculty of Applied Sciences, Macao Polytechnic University, 999078, Macao Special Administrative Region of China.
⁴ School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou 510006, China. Electronic address: zhg102003@gmail.com.
⁵ Department of Computer and Information Science, University of Macau, 999078, Macao Special Administrative Region of China.
⁶ Department of Nasopharyngeal Carcinoma, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Guangzhou 510060, China. Electronic address: yangqi@sysucc.org.cn.

PMID: 39746432
DOI: 10.1016/j.jbi.2024.104764

Abstract

Generative methods are currently popular for medical report generation, as they automatically generate professional reports from input images, assisting physicians in making faster and more accurate decisions. However, current methods face significant challenges: 1) Lesion areas in medical images are often difficult for models to capture accurately, and 2) even when captured, these areas are frequently not described using precise clinical diagnostic terms. To address these problems, we propose a Visual-Linguistic Diagnostic Semantic Enhancement model (VLDSE) to generate high-quality reports. Our approach employs supervised contrastive learning in the Image and Report Semantic Consistency (IRSC) module to bridge the semantic gap between visual and linguistic features. Additionally, we design the Visual Semantic Qualification and Quantification (VSQQ) module and the Post-hoc Semantic Correction (PSC) module to enhance visual semantics and inter-word relationships, respectively. Experiments demonstrate that our model achieves promising performance on the publicly available IU X-RAY and MIMIC-MV datasets. Specifically, on the IU X-RAY dataset, our model achieves a BLEU-4 score of 18.6%, improving the baseline by 12.7%. On the MIMIC-MV dataset, our model improves the BLEU-1 score by 10.7% over the baseline. These results demonstrate the ability of our model to generate accurate and fluent descriptions of lesion areas.

Keywords: Contrastive learning; Medical report generation; Semantic Enhancement; Semantic consistency.