Back to Search
Start Over
Self-supervised modal optimization transformer for image captioning.
- Source :
-
Neural Computing & Applications . Nov2024, Vol. 36 Issue 31, p19863-19878. 16p. - Publication Year :
- 2024
-
Abstract
- In multimodal data processing of image captioning, data from different modalities usually exhibit distinct feature distributions. The gap in unimodal representation makes capturing cross-modal mappings in multimodal learning challenging. Current image captioning models transform images into captions directly. However, this approach results in large data requirements and limited performance on small quantities of multimodal data. In this paper, we introduce a novel self-supervised modal optimization transformer (SMOT) for image captioning. Specifically, we leverage self-supervised learning to propose a cross-modal feature optimizer. This optimizer aims to optimize the distribution of semantic information in images by leveraging raw images and their corresponding paired captions, ultimately approaching the semantic object of the caption. The optimized image features inherit information from both modalities, reducing the disparity in feature distribution between modalities and decreasing reliance on extensive training data. Furthermore, we fuse the features with image grid features and text features, using their complementary information to bridge the differences between features, providing more comprehensive semantic guidance for image captioning. Experimental results demonstrate that our proposed SMOT outperforms state-of-the-art models when trained on limited data, showing efficient learning and good generalization capabilities on small training datasets. Additionally, it also exhibits competitive performance on the MSCOCO dataset, further highlighting its efficacy and potential in the field of image captioning. [ABSTRACT FROM AUTHOR]
- Subjects :
- *ELECTRONIC data processing
*GENERALIZATION
Subjects
Details
- Language :
- English
- ISSN :
- 09410643
- Volume :
- 36
- Issue :
- 31
- Database :
- Academic Search Index
- Journal :
- Neural Computing & Applications
- Publication Type :
- Academic Journal
- Accession number :
- 179969932
- Full Text :
- https://doi.org/10.1007/s00521-024-10211-4