1. EERCA-ViT: Enhanced Effective Region and Context-Aware Vision Transformers for image sentiment analysis.
- Author
-
Wang, Xiaohua, Yang, Jie, Hu, Min, and Ren, Fuji
- Subjects
- *
IMAGE representation , *IMAGE processing , *IMAGE recognition (Computer vision) , *OPTICAL pattern recognition , *COMPUTER vision - Abstract
Different parts of an image have a strong or weak guiding effect on emotions. The key to emotion recognition of images is to fully exploit the regions associated with emotions. Therefore, this paper proposes a visual sentiment classification model with two branches based on visual transformer, termed as Enhanced Effective Region and Context-Aware Vision Transformers (EERCA-ViT). This model includes a primary branch and an auxiliary branch. The primary branch simulates interdependencies between patches by squeezing and stimulating patches (P-SE), thereby highlighting effective region features in the global feature. The auxiliary branch removes feature patches that have been tagged by the primary branch through the context-aware module (CAM), forcing the network to discover different sentiment discriminative regions. At the same time, a two-part loss function is constructed to improve the robustness of the model. Finally, extensive experiments on six benchmark datasets show that the proposed method outperforms the state-of-the-art image sentiment analysis methods. Furthermore, the effectiveness of the different modules of the framework (P-SE and CAM) has been well demonstrated through extensive comparative experiments. • Removing focused regions can guide the network to capture effective contextual sentiment features. • Fully mining the features of sentiment-related regions is one of the key approaches to sentiment classification. • The method's effectiveness and generalisability were confirmed through extensive experiments on eight datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF