1. End-to-end speaker identification research based on multi-scale SincNet and CGAN.
- Author
-
Wei, Guangcun, Zhang, Yanna, Min, Hang, and Xu, Yunfei
- Subjects
- *
DEEP learning , *AUTOMATIC speech recognition , *FILTER banks , *SYSTEM identification , *COMPETITIVE advantage in business - Abstract
Deep learning has improved the performance of speaker identification systems in recent years, but it has also presented significant challenges. Typically, data-driven modeling approaches based on DNNs rely on large-scale training data, but due to environmental constraints, large amounts of user speech data are not obtainable. As a result, this work proposes a new SincGAN speaker identification (SI) model that directly recognizes the input's raw waveform, allowing speaker identification with only a small number of training utterances. Unlike methods that use standard hand-crafted feature recognition, this method is real end-to-end recognition. In this case, a generator is utilized to reconstruct the input samples to enhance the amount of training data, and a discriminator is employed to finish the SI classification task. A multi-scale SincNet layer based on three bespoke filter banks is also added to capture the low-level speech representation of the three channels in the waveform, allowing the model to better catch critical narrowband speaker properties (e.g., pitch and resonance peaks). Experiments reveal that the method achieves better recognition results on the TIMIT and LIBRISPEECH datasets under the constraints of limited training data. Furthermore, the proposed model has a competitive advantage over existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF