Back to Search Start Over

Image captioning with residual swin transformer and Actor-Critic.

Authors :
Zhou, Zhibo
Yang, Yang
Li, Zhoujun
Zhang, Xiaoming
Huang, Feiran
Source :
Neural Computing & Applications. Oct2022, p1-13.
Publication Year :
2022

Abstract

Image captioning is one essential work in the multi-modal area, which employs computer vision and natural language processing technology together to describe image content. Most current methods employ the encoder–decoder framework to achieve satisfactory results. Recently, transformers have been extensively utilized in image captioning tasks and earned satisfactory results. Nevertheless, transformers pay more attention to the global features in images and divide images into fixed sizes and process them separately. Those models can not capture the relationship between the internal components in image, such as the relationships between objects, regions and object attributes. In this article, we introduce a novel Residual Swin Transformer and Actor-Critic (RSTAC) in the image captioning task. RSTAC consists of two modules: Residual Swin Transformer and Actor-Critic modules. In particular, we employ a residual network to preserve the vanilla feature in the first place and several residual Swin Transformer blocks with convolution operations to obtain the mid-level features, which are also composed of a residual network and several Swin Transformer blocks and convolution operations. It aims to explore the internal correlation between image content and obtain multi-view features in the image. Then, a policy network is utilized to explore possible words and forecast the future generation word. A value network is employed to calculate the reward of generated sentences whose goal is directly optimizing non-differentiable quality metrics and enhancing the performance of generated sentences. Experiments reveal that our model surpasses other competitive models and performs better on the MSCOCO and Flickr30k datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09410643
Database :
Academic Search Index
Journal :
Neural Computing & Applications
Publication Type :
Academic Journal
Accession number :
159497352
Full Text :
https://doi.org/10.1007/s00521-022-07848-4