Back to Search Start Over

Emotion embedding framework with emotional self-attention mechanism for speaker recognition.

Authors :
Li, Dongdong
Yang, Zhuo
Liu, Jinlin
Yang, Hai
Wang, Zhe
Source :
Expert Systems with Applications. Mar2024:Part F, Vol. 238, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

The emotional states of speech have a great impact on the efficiency of speaker recognition (SR) system. Many researchers focus on how to map speech with different emotions to an emotion invariant embedding, which reduces the diversity of data. This paper proposes a new emotion embedding framework with self-attention mechanism for speaker recognition. First, several deep neural networks (DNNs) are trained to classify speakers in different emotional states as emotion embedding extractors during development phase. Then at enrollment stage, these pre-trained models are used to extend emotion embeddings from neutral speech. In order to make the final speaker embedding more representative, the classification model is trained with self-attention mechanism in emotion dimension, so that the framework can automatically annotate the weights of the emotion embeddings. Experiments were carried out on both Mandarin Affective Speech Corpus (MASC) and Crowd-Sourced Emotional Multimodal Actors Dataset (CREMA-D). The results show the proposed method achieves the best of Identification Rate (IR) and Equal Error Rate (EER) which are 59.14%, 15.79% on MASC and 75.98%, 8.14% on CREMA-D compared with state-of-the-art methods. In addition, the cross-database experiments also further demonstrate the practicability of the method in real scenes. • An emotion embedding framework for robust emotional speaker recognition is proposed. • Emotional feature extractors is pre-trained to obtain prior emotional representation. • Various emotion embeddings are extracted by decomposing the emotional features. • Self-attention is introduced to measure the importance of emotion embeddings. • Emotional speaker embedding enriches neutral speech with emotional information. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
238
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
173694052
Full Text :
https://doi.org/10.1016/j.eswa.2023.122244