Back to Search Start Over

Self multi-head attention for speaker recognition

Authors :
Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
India Massana, Miquel Àngel
Safari, Pooyan
Hernando Pericás, Francisco Javier
Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
India Massana, Miquel Àngel
Safari, Pooyan
Hernando Pericás, Francisco Javier
Publication Year :
2019

Abstract

Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance level. Given thespeech signal, these algorithms extract a sequence of speakerembeddings from short segments and those are averaged to ob-tain an utterance level speaker representation. In this work wepropose the use of an attention mechanism to obtain a discrim-inative speaker embedding given non fixed length speech utter-ances. Our system is based on a Convolutional Neural Network(CNN) that encodes short-term speaker features from the spec-trogram and a self multi-head attention model that maps theserepresentations into a long-term speaker embedding. The atten-tion model that we propose produces multiple alignments fromdifferent subsegments of the CNN encoded states over the se-quence. Hence this mechanism works as a pooling layer whichdecides the most discriminative features over the sequence toobtain an utterance level representation. We have tested thisapproach for the verification task for the VoxCeleb1 dataset.The results show that self multi-head attention outperforms bothtemporal and statistical pooling methods with a18%of rela-tive EER. Obtained results show a58%relative improvementin EER compared to i-vector+PLDA<br />Peer Reviewed<br />Postprint (published version)

Details

Database :
OAIster
Notes :
5 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1151824285
Document Type :
Electronic Resource