Back to Search Start Over

MAENet: A novel multi-head association attention enhancement network for completing intra-modal interaction in image captioning.

Authors :
Hu, Nannan
Fan, Chunxiao
Ming, Yue
Feng, Fan
Source :
Neurocomputing. Jan2023, Vol. 519, p69-81. 13p.
Publication Year :
2023

Abstract

Image captioning attracts much attention as it bridges computer vision and natural language processing. Recent works show that transformer-based models with the multi-head self-attention can explore intra-modal interactions for generating high-quality image captions. However, the subspace of each attention head is operated independently in these multi-head attention methods, which ignores the association between attention heads and makes the learning of intra-modal interaction incomplete. In this paper, we propose a Multi-head Association Attention Enhancement Network (MAENet) for image captioning, which leverages a novel Multi-head Association Attention Enhancement (MAE) block for completing intra-modal interaction learning. The proposed MAE block contains Multi-head Association Attention (MAA) and Attention Enhancement (AE) module.The MAA calculates the contributive weight of different attention heads, and captures the associated information from adjacent attention subspaces via learned associative parameters. The AE module follows with the MAA to further enhance the association attention results through an additional spatial and channel-wise attention aggregation. It's worth noting that the MAE block is a plug-and-play module that can be cascaded with other multi-head attention mechanisms. Extensive experiments on MS COCO show that our model achieves a quite competitive performance, especially for the model of MAE block cascaded with X-linear attention obtains the best-reported SPICE performance of 23.5 % on the Karpathy test split. This clearly demonstrates that the proposed model can better model the interactive information and result in superior captions. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
519
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
160539609
Full Text :
https://doi.org/10.1016/j.neucom.2022.11.045