Back to Search Start Over

On the Robustness of Arabic Speech Dialect Identification

Authors :
Sullivan, Peter
Elmadany, AbdelRahim
Abdul-Mageed, Muhammad
Publication Year :
2023

Abstract

Arabic dialect identification (ADI) tools are an important part of the large-scale data collection pipelines necessary for training speech recognition models. As these pipelines require application of ADI tools to potentially out-of-domain data, we aim to investigate how vulnerable the tools may be to this domain shift. With self-supervised learning (SSL) models as a starting point, we evaluate transfer learning and direct classification from SSL features. We undertake our evaluation under rich conditions, with a goal to develop ADI systems from pretrained models and ultimately evaluate performance on newly collected data. In order to understand what factors contribute to model decisions, we carry out a careful human study of a subset of our data. Our analysis confirms that domain shift is a major challenge for ADI models. We also find that while self-training does alleviate this challenges, it may be insufficient for realistic conditions.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2306.03789
Document Type :
Working Paper