A transformer fine-tuning strategy for text dialect identification

Neural Comput Appl. 2023;35(8):6115-6124. doi: 10.1007/s00521-022-07944-5. Epub 2022 Nov 15.

Abstract

Online medical consultation can significantly improve the efficiency of primary health care. Recently, many online medical question-answer services have been developed that connect the patients with relevant medical consultants based on their questions. Considering the linguistic variety in their question, social background identification of patients can improve the referral system by selecting a medical consultant with a similar social origin for efficient communication. This paper has proposed a novel fine-tuning strategy for the pre-trained transformers to identify the social origin of text authors. When fused with the existing adapter model, the proposed methods achieve an overall accuracy of 53.96% for the Arabic dialect identification task on the Nuanced Arabic Dialect Identification (NADI) dataset. The overall accuracy is 0.54% higher than the previous best for the same dataset, which establishes the utility of custom fine-tuning strategies for pre-trained transformer models.

Keywords: Arabic language; Author profiling; Dialect identification; Text classification.