Back to Search Start Over

FlexFormer: Flexible Transformer for efficient visual recognition.

Authors :
Fan, Xinyi
Liu, Huajun
Source :
Pattern Recognition Letters. May2023, Vol. 169, p95-101. 7p.
Publication Year :
2023

Abstract

• Conv-MSA based on contextual querying dot-products to improve the fine-grained features representation for self-attentions. • Tanh-Softmax hybrid nonlinearization method in linear self-attention for fast convergence on visual recognition tasks. • FlexFormer model achieved the state-of-the-art recognition accuracy on several benchmarks. • FlexFormer model works with fewer parameters and higher computation efficiency. Vision Transformers have shown overwhelming superiority in computer vision communities compared with convolutional neural networks. Nevertheless, the understanding of multi-head self attentions, as the de facto ingredient of Transformers, is still limited, which leads to surging interest in explaining its core ideology. A notable theory interprets that, unlike high-frequency sensitive convolutions, self-attention behaves like a generalized spatial smoothing and blurs the high spatial-frequency signals with depth increasing. In this paper, we design a Conv-MSA structure to extract efficient local contextual information and remedy the inherent drawback of self-attention. Accordingly, a flexible transformer structure named FlexFormer , with linear computational complexity on input image size, is proposed. Experimental results on several visual recognition benchmarks show that our FlexFormer achieved the state-of-the-art results on visual recognition tasks with fewer parameters and higher computational efficiency. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01678655
Volume :
169
Database :
Academic Search Index
Journal :
Pattern Recognition Letters
Publication Type :
Academic Journal
Accession number :
163308878
Full Text :
https://doi.org/10.1016/j.patrec.2023.03.028