Back to Search Start Over

Unpaired referring expression grounding via bidirectional cross-modal matching.

Authors :
Shi, Hengcan
Hayat, Munawar
Cai, Jianfei
Source :
Neurocomputing. Jan2023, Vol. 518, p39-49. 11p.
Publication Year :
2023

Abstract

Referring expression grounding is an important and challenging task in computer vision. To avoid the laborious annotation in conventional referring grounding, unpaired referring grounding is introduced, where the training data only contains a number of images and queries without correspondences. The few existing solutions to unpaired referring grounding are still preliminary, due to the challenges of learning vision-language correlation and lack of the top-down guidance with unpaired data. Existing works are only able to learn vision-language correlation by modality conversion, where critical information are lost. They also heavily rely on pre-extracted object proposals and thus cannot generate correct predictions with defective proposals. In this paper, we propose a novel bidirectional cross-modal matching (BiCM) framework to address these challenges. Particularly, we design a query-aware attention map (QAM) module that introduces top-down perspective via generating query-specific visual attention maps to avoid the over-reliance on pre-extracted object proposals. A cross-modal object matching (COM) module is further introduced to predict the target objects from a bottom-up perspective. This module exploits the recently emerged image-text matching pretrained model, CLIP, to learn cross-modal correlation without modality conversion. The top-down and bottom-up predictions are then integrated via a similarity fusion (SF) module. We also propose a knowledge adaptation matching (KAM) module that leverages unpaired training data to adapt pretrained knowledge to the target dataset and task. Experiments show that our framework significantly outperforms previous works on five grounding datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
518
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
160438132
Full Text :
https://doi.org/10.1016/j.neucom.2022.10.079