Back to Search Start Over

Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTM

Authors :
Shiqing Zhang
Xiaoming Zhao
Qi Tian
Source :
IEEE Transactions on Affective Computing. 13:680-688
Publication Year :
2022
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2022.

Abstract

Recently, emotion recognition in real sceneries such as in the wild has attracted extensive attention in affective computing, because existing spontaneous emotions in real sceneries are more challenging and difficult to identify than other emotions. Motivated by the diverse effects of different lengths of audio spectrograms on emotion identification, this paper proposes a multiscale deep convolutional long short-term memory (LSTM) framework for spontaneous speech emotion recognition. Initially, a deep convolutional neural network (CNN) model is used to learn deep segment-level features on the basis of the created image-like three channels of spectrograms. Then, a deep LSTM model is adopted on the basis of the learned segment-level CNN features to capture the temporal dependency among all divided segments in an utterance for utterance-level emotion recognition. Finally, different emotion recognition results, obtained by combining CNN with LSTM at multiple lengths of segment-level spectrograms, are integrated by using a score-level fusion strategy. Experimental results on two challenging spontaneous emotional datasets, i.e., the AFEW5.0 and BAUM-1s databases, demonstrate the promising performance of the proposed method, outperforming state-of-the-art methods.

Details

ISSN :
23719850
Volume :
13
Database :
OpenAIRE
Journal :
IEEE Transactions on Affective Computing
Accession number :
edsair.doi...........316a8362bfb05cdc871426eb9c5afd57
Full Text :
https://doi.org/10.1109/taffc.2019.2947464