Start Over

Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition.

Authors :: Zhang, Shiqing
Zhang, Shiliang
Huang, Tiejun
Gao, Wen
Tian, Qi
Source :: IEEE Transactions on Circuits & Systems for Video Technology. Oct2018, Vol. 28 Issue 10, p3030-3043. 14p.
Publication Year :: 2018
Abstract: Emotion recognition is challenging due to the emotional gap between emotions and audio–visual features. Motivated by the powerful feature learning ability of deep neural networks, this paper proposes to bridge the emotional gap by using a hybrid deep model, which first produces audio–visual segment features with Convolutional Neural Networks (CNNs) and 3D-CNN, then fuses audio–visual segment features in a Deep Belief Networks (DBNs). The proposed method is trained in two stages. First, CNN and 3D-CNN models pre-trained on corresponding large-scale image and video classification tasks are fine-tuned on emotion recognition tasks to learn audio and visual segment features, respectively. Second, the outputs of CNN and 3D-CNN models are combined into a fusion network built with a DBN model. The fusion network is trained to jointly learn a discriminative audio–visual segment feature representation. After average-pooling segment features learned by DBN to form a fixed-length global video feature, a linear Support Vector Machine is used for video emotion classification. Experimental results on three public audio–visual emotional databases, including the acted RML database, the acted eNTERFACE05 database, and the spontaneous BAUM-1s database, demonstrate the promising performance of the proposed method. To the best of our knowledge, this is an early work fusing audio and visual cues with CNN, 3D-CNN, and DBN for audio–visual emotion recognition. [ABSTRACT FROM AUTHOR]

Subjects :: *EMOTION recognition
*DEEP learning
*AUDIOVISUAL materials
*FEATURE extraction
*IMAGE processing
*ARTIFICIAL neural networks

Details

Language :: English
ISSN :: 10518215
Volume :: 28
Issue :: 10
Database :: Academic Search Index
Journal :: IEEE Transactions on Circuits & Systems for Video Technology
Publication Type :: Academic Journal
Accession number :: 132683746
Full Text :: https://doi.org/10.1109/TCSVT.2017.2719043

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources