Back to Search Start Over

One-Stage Visual Relationship Referring With Transformers and Adaptive Message Passing.

Authors :
Wang, Hang
Du, Youtian
Zhang, Yabin
Li, Shuai
Zhang, Lei
Source :
IEEE Transactions on Image Processing; 2023, Vol. 32, p190-202, 13p
Publication Year :
2023

Abstract

There exist a variety of visual relationships among entities in an image. Given a relationship query $\langle subject, predicate, object \rangle $ , the task of visual relationship referring (VRR) aims to disambiguate instances of the same entity category and simultaneously localize the subject and object entities in an image. Previous works of VRR can be generally categorized into one-stage and multi-stage methods. The former ones directly localize a pair of entities from the image but they suffer from low prediction accuracy, while the latter ones perform better but they are indirect to localize only a couple of entities by pre-generating a rich amount of candidate proposals. In this paper, we formulate the task of VRR as an end-to-end bounding box regression problem and propose a novel one-stage approach, called VRR-TAMP, by effectively integrating Transformers and an adaptive message passing mechanism. First, visual relationship queries and images are respectively encoded to generate the basic modality-specific embeddings, which are then fed into a cross-modal Transformer encoder to produce the joint representation. Second, to obtain the specific representation of each entity, we introduce an adaptive message passing mechanism and design an entity-specific information distiller SR-GMP, which refers to a gated message passing (GMP) module that works on the joint representation learned from a single learnable token. The GMP module adaptively distills the final representation of an entity by incorporating the contextual cues regarding the predicate and the other entity. Experiments on VRD and Visual Genome datasets demonstrate that our approach significantly outperforms its one-stage competitors and achieves competitive results with the state-of-the-art multi-stage methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10577149
Volume :
32
Database :
Complementary Index
Journal :
IEEE Transactions on Image Processing
Publication Type :
Academic Journal
Accession number :
160960786
Full Text :
https://doi.org/10.1109/TIP.2022.3226624