Back to Search Start Over

Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation

Authors :
Zhiwei Hu
Huchuan Lu
Jiayu Sun
Guang Feng
Lihe Zhang
Source :
IEEE Transactions on Neural Networks and Learning Systems. 34:2246-2258
Publication Year :
2023
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2023.

Abstract

Recently, referring image localization and segmentation has aroused widespread interest. However, the existing methods lack a clear description of the interdependence between language and vision. To this end, we present a bidirectional relationship inferring network (BRINet) to effectively address the challenging tasks. Specifically, we first employ a vision-guided linguistic attention module to perceive the keywords corresponding to each image region. Then, language-guided visual attention adopts the learned adaptive language to guide the update of the visual features. Together, they form a bidirectional cross-modal attention module (BCAM) to achieve the mutual guidance between language and vision. They can help the network align the cross-modal features better. Based on the vanilla language-guided visual attention, we further design an asymmetric language-guided visual attention, which significantly reduces the computational cost by modeling the relationship between each pixel and each pooled subregion. In addition, a segmentation-guided bottom-up augmentation module (SBAM) is utilized to selectively combine multilevel information flow for object localization. Experiments show that our method outperforms other state-of-the-art methods on three referring image localization datasets and four referring image segmentation datasets.

Details

ISSN :
21622388 and 2162237X
Volume :
34
Database :
OpenAIRE
Journal :
IEEE Transactions on Neural Networks and Learning Systems
Accession number :
edsair.doi.dedup.....be3b14d04e9ece3b871d83c60e61f35b
Full Text :
https://doi.org/10.1109/tnnls.2021.3106153