Back to Search Start Over

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism.

Authors :
Min Liu
Jun Tang
Source :
Journal of Information Processing Systems; Aug2021, Vol. 17 Issue 4, p754-771, 18p
Publication Year :
2021

Abstract

In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1976913X
Volume :
17
Issue :
4
Database :
Complementary Index
Journal :
Journal of Information Processing Systems
Publication Type :
Academic Journal
Accession number :
152332059
Full Text :
https://doi.org/10.3745/JIPS.02.0161