Back to Search
Start Over
Compensating for speaker or lexical variabilities in speech for emotion recognition
- Source :
- Speech Communication. 57:1-12
- Publication Year :
- 2014
- Publisher :
- Elsevier BV, 2014.
-
Abstract
- Affect recognition is a crucial requirement for future human machine interfaces to effectively respond to nonverbal behaviors of the user. Speech emotion recognition systems analyze acoustic features to deduce the speaker's emotional state. However, human voice conveys a mixture of information including speaker, lexical, cultural, physiological and emotional traits. The presence of these communication aspects introduces variabilities that affect the performance of an emotion recognition system. Therefore, building robust emotional models requires careful considerations to compensate for the effect of these variabilities. This study aims to factorize speaker characteristics, verbal content and expressive behaviors in various acoustic features. The factorization technique consists in building phoneme level trajectory models for the features. We propose a metric to quantify the dependency between acoustic features and communication traits (i.e., speaker, lexical and emotional factors). This metric, which is motivated by the mutual information framework, estimates the uncertainty reduction in the trajectory models when a given trait is considered. The analysis provides important insights on the dependency between the features and the aforementioned factors. Motivated by these results, we propose a feature normalization technique based on the whitening transformation that aims to compensate for speaker and lexical variabilities. The benefit of employing this normalization scheme is validated with the presented factor analysis method. The emotion recognition experiments show that the normalization approach can attenuate the variability imposed by the verbal content and speaker identity, yielding 4.1% and 2.4% relative performance improvements on a selected set of features, respectively.
- Subjects :
- Normalization (statistics)
Linguistics and Language
Computer science
business.industry
Communication
Speech recognition
Mutual information
Speaker recognition
computer.software_genre
Language and Linguistics
Computer Science Applications
Speaker diarisation
Modeling and Simulation
Feature (machine learning)
Computer Vision and Pattern Recognition
Artificial intelligence
Set (psychology)
business
computer
Software
Human voice
Natural language processing
Uncertainty reduction theory
Subjects
Details
- ISSN :
- 01676393
- Volume :
- 57
- Database :
- OpenAIRE
- Journal :
- Speech Communication
- Accession number :
- edsair.doi...........43efde8c13525c6cc1599d71dac5b231
- Full Text :
- https://doi.org/10.1016/j.specom.2013.07.011