Start Over

Triple disentangled representation learning for multimodal affective analysis.

Authors :: Zhou, Ying
Liang, Xuefeng
Chen, Han
Zhao, Yin
Chen, Xin
Yu, Lida
Source :: Information Fusion. Feb2025, Vol. 114, pN.PAG-N.PAG. 1p.
Publication Year :: 2025
Abstract: In multimodal affective analysis (MAA) tasks, the presence of heterogeneity among different modalities has propelled the exploration of the disentanglement methods as a pivotal area. Many emerging studies focus on disentangling the modality-invariant and modality-specific representations from input data and then fusing them for prediction. However, our study shows that modality-specific representations may contain information that is irrelevant or conflicting with the tasks, which downgrades the effectiveness of learned multimodal representations. We revisit the disentanglement issue, and propose a novel triple disentanglement approach, TriDiRA, which disentangles the modality-invariant, effective modality-specific and ineffective modality-specific representations from input data. By fusing only the modality-invariant and effective modality-specific representations, TriDiRA can significantly alleviate the impact of irrelevant and conflicting information across modalities during model training and prediction. Extensive experiments conducted on four benchmark datasets demonstrate the effectiveness and generalization of our triple disentanglement, which outperforms SOTA methods. The code is available at https://anonymous.4open.science/r/TriDiRA. • Propose a triple disentanglement multimodal representation learning method. • Design a dual-out attention output module for triple disentanglement. • Eliminate label-irrelevant representations from modality-specific representations. • Outperform SOTA methods on four benchmark datasets. [ABSTRACT FROM AUTHOR]