Back to Search Start Over

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Authors :
Chen, Sanyuan
Wu, Yu
Wang, Chengyi
Liu, Shujie
Chen, Zhuo
Wang, Peidong
Liu, Gang
Li, Jinyu
Wu, Jian
Yu, Xiangzhan
Wei, Furu
Publication Year :
2022

Abstract

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised learning on speaker-related tasks, e.g. speaker verification (SV), through a series of carefully designed experiments. Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size, while the SSL quantizer has a minor impact. We further employ the integrated gradients attribution method and loss landscape visualization to understand the effectiveness of self-supervised learning for speaker recognition performance.<br />Comment: Accepted by INTERSPEECH 2022

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2204.12765
Document Type :
Working Paper