Back to Search
Start Over
Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition.
- Source :
- IEEE Transactions on Affective Computing; 2024, Vol. 17, p157-172, 16p
- Publication Year :
- 2024
-
Abstract
- Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model was proved to be more accurate and robust compared to fully-supervised methods on low data regimes. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 19493045
- Volume :
- 17
- Database :
- Complementary Index
- Journal :
- IEEE Transactions on Affective Computing
- Publication Type :
- Academic Journal
- Accession number :
- 175943072
- Full Text :
- https://doi.org/10.1109/TAFFC.2023.3263907