1. Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition
- Author
-
Fan, Zhiyun, Dong, Linhao, Shen, Chen, Liang, Zhenlin, Zhang, Jun, Lu, Lu, and Ma, Zejun
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Computer Science - Sound - Abstract
Code-switching speech recognition (CSSR) transcribes speech that switches between multiple languages or dialects within a single sentence. The main challenge in this task is that different languages often have similar pronunciations, making it difficult for models to distinguish between them. In this paper, we propose a method for solving the CSSR task from the perspective of language-specific acoustic boundary learning. We introduce language-specific weight estimators (LSWE) to model acoustic boundary learning in different languages separately. Additionally, a non-autoregressive (NAR) decoder and a language change detection (LCD) module are employed to assist in training. Evaluated on the SEAME corpus, our method achieves a state-of-the-art mixed error rate (MER) of 16.29% and 22.81% on the test_man and test_sge sets. We also demonstrate the effectiveness of our method on a 9000-hour in-house meeting code-switching dataset, where our method achieves a relatively 7.9% MER reduction.
- Published
- 2023