1. Gradient aggregation based fine-grained image retrieval: A unified viewpoint for CNN and Transformer.
- Author
-
Yu, Han, Lu, Huibin, Zhao, Min, Li, Zhuoyi, and Gu, Guanghua
- Subjects
- *
TRANSFORMER models , *IMAGE retrieval , *FEATURE extraction , *DEEP learning - Abstract
The gradients of CNN are traditionally utilized for optimization and visualization. In this paper, we find that a discriminative representation hides in the gradients of convolution filters. Based on this, we propose a corresponding feature extraction and aggregation method for fine-grained image retrieval (FGIR). Firstly, we propose a metric to evaluate manually-designed loss functions and design a loss function originating from Grad-CAM in the testing phase based on it to extract the gradients of the convolution filters. Secondly, we take the gradients as the new features and design a succinct approach to aggregate them into a compact vector, which is named as Convolution Filters Gradient Aggregation (CFGA) feature. CFGA features can be extracted from pre-trained and fine-tuned CNN models. Extensive experiments are conducted on FGIR to verify the effectiveness of our proposed CFGA approach, compared with five supervised state-of-the-art methods and two unsupervised methods on two standard fine-grained retrieval datasets. Moreover, we generalize the CFGA method designed for CNN to Swin Transformer, and propose the Transformer parameter gradients aggregation (TPGA) method, which proves the applicability of the core idea of CFGA/TPGA to mainstream feature extraction models. We achieve state-of-the-art FGIR performance on CUB-200-2011 dataset and CARS196 dataset. • We design a novel feature aggregation method to attain the representation. • We propose a visual metric for evaluating. • The proposed method has extensive adaptability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF