Swin-FER: Swin Transformer for Facial Expression Recognition.

Authors :: Bie, Mei
Xu, Huan
Gao, Yan
Song, Kai
Che, Xiangjiu
Source :: Applied Sciences (2076-3417); Jul2024, Vol. 14 Issue 14, p6125, 14p
Publication Year :: 2024
Abstract: The ability of transformers to capture global context information is highly beneficial for recognizing subtle differences in facial expressions. However, compared to convolutional neural networks, transformers require the computation of dependencies between each element and all other elements, leading to high computational complexity. Additionally, the large number of model parameters need extensive data for training so as to avoid overfitting. In this paper, according to the characteristics of facial expression recognition tasks, we made targeted improvements to the Swin transformer network. The proposed Swin-Fer network adopts the fusion strategy from the middle layer to deeper layers and employs a method of data dimension conversion to make the network perceive more spatial dimension information. Furthermore, we also integrated a mean module, a split module, and a group convolution strategy to effectively control the number of parameters. On the Fer2013 dataset, an in-the-wild dataset, Swin-Fer achieved an accuracy of 71.11%. On the CK+ dataset, an in-the-lab dataset, the accuracy reached 100%. [ABSTRACT FROM AUTHOR]