Back to Search
Start Over
Speech Emotion Recognition with Local-Global Aware Deep Representation Learning
- Source :
- ICASSP
- Publication Year :
- 2020
- Publisher :
- IEEE, 2020.
-
Abstract
- Convolutional neural network (CNN) based deep representation learning methods for speech emotion recognition (SER) have demonstrated great success. The basic design of CNN restricts the ability to model only local information well. Capsule network (CapsNet) can overcome the shortages of CNNs to capture the shallow global features from the spectrogram, although CapsNet cannot learn the local and deep global information. In this paper, we propose a local-global aware deep representation learning system that mainly includes two modules. One module contains a multi-scale CNN, time- frequency CNN (TFCNN) to learn the local representation. In the other module, we introduce a structure with dense connections of multiple blocks to learn shallow and deep global information. Every block in this structure is a complete CapsNet improved by a new routing algorithm. The local and global representations are fed to the classifier and achieve an absolute increase of at least 4.25% than benchmarks on IEMOCAP.
- Subjects :
- Computer science
business.industry
020206 networking & telecommunications
Economic shortage
02 engineering and technology
010501 environmental sciences
01 natural sciences
Convolutional neural network
0202 electrical engineering, electronic engineering, information engineering
Spectrogram
Emotion recognition
Artificial intelligence
business
Feature learning
Classifier (UML)
0105 earth and related environmental sciences
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Accession number :
- edsair.doi...........ed00b3f63f94c96dd1e9a3d07ce9abcb
- Full Text :
- https://doi.org/10.1109/icassp40776.2020.9053192