Back to Search Start Over

Cross-modal independent matching network for image-text retrieval.

Authors :
Ke, Xiao
Chen, Baitao
Yang, Xiong
Cai, Yuhang
Liu, Hao
Guo, Wenzhong
Source :
Pattern Recognition. Mar2025, Vol. 159, pN.PAG-N.PAG. 1p.
Publication Year :
2025

Abstract

Image-text retrieval serves as a bridge connecting vision and language. Mainstream modal cross matching methods can effectively perform cross-modal interactions with high theoretical performance. However, there is a deficiency in efficiency. Modal independent matching methods exhibit superior efficiency but lack in performance. Therefore, achieving a balance between matching efficiency and performance becomes a challenge in the field of image-text retrieval. In this paper, we propose a new Cross-modal Independent Matching Network (CIMN) for image-text retrieval. Specifically, we first use the proposed Feature Relationship Reasoning (FRR) to infer neighborhood and potential relations of modal features. Then, we introduce Graph Pooling (GP) based on graph convolutional networks to perform modal global semantic aggregation. Finally, we introduce the Gravitation Loss (GL) by incorporating sample mass into the learning process. This loss can correct the matching relationship between and within each modality, avoiding the problem of equal treatment of all samples in the traditional triplet loss. Extensive experiments on Flickr30K and MSCOCO datasets demonstrate the superiority of the proposed method. It achieves a good balance between matching efficiency and performance, surpasses other similar independent matching methods in performance, and can obtain retrieval accuracy comparable to some mainstream cross matching methods with an order of magnitude lower inference time. • NRR and PRR form FRR, enabling efficient global relationship reasoning at lower cost. • Graph Pooling uses graph structures for efficient global semantic aggregation. • Sample mass and Gravitation Loss improve diverse matching in image-text retrieval. • CIMN achieves competitive performance on MSCOCO and Flickr30K with high efficiency. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
*GRAVITATION
*NEIGHBORHOODS

Details

Language :
English
ISSN :
00313203
Volume :
159
Database :
Academic Search Index
Journal :
Pattern Recognition
Publication Type :
Academic Journal
Accession number :
181410936
Full Text :
https://doi.org/10.1016/j.patcog.2024.111096