Back to Search Start Over

A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

Authors :
Qi, Fan
Zhang, Huaiwen
Yang, Xiaoshan
Xu, Changsheng
Source :
IEEE Transactions on Circuits and Systems for Video Technology; 2024, Vol. 34 Issue: 7 p5728-5741, 14p
Publication Year :
2024

Abstract

Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.

Details

Language :
English
ISSN :
10518215 and 15582205
Volume :
34
Issue :
7
Database :
Supplemental Index
Journal :
IEEE Transactions on Circuits and Systems for Video Technology
Publication Type :
Periodical
Accession number :
ejs66895150
Full Text :
https://doi.org/10.1109/TCSVT.2024.3362270