Start Over

Text-based person search via local-relational-global fine grained alignment.

Authors :: Zhou, Junfeng
Huang, Baigang
Fan, Wenjiao
Cheng, Ziqian
Zhao, Zhuoyi
Zhang, Weifeng
Source :: Knowledge-Based Systems. Feb2023, Vol. 262, pN.PAG-N.PAG. 1p.
Publication Year :: 2023
Abstract: The core difficulty of text-based person search is how to achieve fine-grained alignment of visual and linguistic modal data, so as to bridge the gap of modal heterogeneity. Most existing works on this task focus on global and local features extraction and matching, ignoring the importance of relational information. This paper proposes a new text-based person search model, named CM-LRGNet , which extracts C ross- M odal L ocal- R elational- G lobal features in an end-to-end manner, and performs fine-grained cross-modal alignment on the above three feature levels. Concretely, we first split the convolutional feature maps to obtain local features of images, and adaptively extract textual local features. Then a relation encoding module is proposed to implicitly learn the relational information implied in the images and texts. Finally, a relation-aware graph attention network is designed to fuse the local and relational features to generate global representations for both images and text queries. Extensive experimental results on benchmark dataset (CUHK-PEDES) show that our approach can achieve state-of-the-art performance (64.18%, 82.97%, 89.85% in terms of Top-1, Top-5, and Top-10 accuracies), by learning and aligning local-relational-global representations from different modalities. Our code has been released in https://github.com/zhangweifeng1218/Text-based-Person-Search. [ABSTRACT FROM AUTHOR]