Start Over

A three-stage neural model for Arabic Dialect Identification.

Authors :: Mohammed, Abdelmajeed
Jiangbin, Zheng
Murtadha, Ahmed
Source :: Computer Speech & Language. May2023, Vol. 80, pN.PAG-N.PAG. 1p.
Publication Year :: 2023
Abstract: The Arabic language has several dialects across the twenty-two Arabic-speaking countries in Asia and Africa. Arabic Dialect Identification (ADI) is still a challenging task due to the well-recognized complexity and variations of Arabic dialects. It is noteworthy that Arabic dialects share the majority of tokens. The state-of-the-art solutions have been built upon various machine learning approaches. However, they commonly treat all words equally-likely and thus ignores the importance of dialectal words in response to a given dialect. In this paper, we propose a three-stage neural approach to learn the dialectal semantic representation from a given corpus. Specifically, we first aim to capture the dialect-relevant information, which is then used to model the dialectal vector representation. The goal is to filter away the shared words between dialects to reduce the noisy information fused to the fully connected layer. We introduce two variants, including LSTM-based and Transformer-based. Finally, we empirically evaluate the performance of the proposed solution by a comparative study on real benchmark datasets, including MADAR, NADI, and QADI. Our extensive experiments show that it consistently achieves state-of-the-art performance. Due to the well-recognized challenging of ADI, the improvement margins can be deemed considerable. The code is available on GitHub. 1 1 The code is available: https://github.com/amurtadha/arabic-dialect-identification. [ABSTRACT FROM AUTHOR]