Back to Search Start Over

Exploring the diversity and invariance in yourself for visual pre-training task.

Authors :
Wei, Longhui
Xie, Lingxi
Zhou, Wengang
Li, Houqiang
Tian, Qi
Source :
Pattern Recognition. Jul2023, Vol. 139, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

• It is an original work on exploring the region-level diversity and invariance inside images for better learning visual pre-training knowledge. • Propose a simple but effective approach to fully extract the multi-grained visual knowledge inside each image. • Extensive experiments have shown the superiority of our proposed method compared with recent state-of-the-art visual pre-training methods. Recently, self-supervised learning methods have achieved remarkable success in the visual pre-training task. By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models. However, these works still cannot avoid the representation collapse problem, i.e. , they only focus on limited regions or the extracted features on totally different regions inside each image are nearly the same. Generally, this problem makes the pre-training models cannot sufficiently describe the multi-grained information inside images, which further limits the upper bound of their transfer performance. To alleviate this issue, this paper introduces a simple but effective mechanism, called E xploring the D iversity and I nvariance in Y ourself (E-DIY). By simply pushing the most different regions inside each augmented view away, E-DIY can preserve the diversity of extracted region-level features. By pulling the most similar regions from different augmented views of the same image together, E-DIY can ensure the robustness of extracted region-level features. Benefiting from the above diversity and invariance exploring mechanism, E-DIY better extracts the multi-grained visual information inside each image compared with previous self-supervised learning approaches. Extensive experiments on various downstream tasks have demonstrated the superiority of our method, e.g. , there is 1.9 % improvement (compared with the recent state-of-the-art method, BYOL) on A P 50 metric of detection task on VOC. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00313203
Volume :
139
Database :
Academic Search Index
Journal :
Pattern Recognition
Publication Type :
Academic Journal
Accession number :
162848476
Full Text :
https://doi.org/10.1016/j.patcog.2023.109437