Back to Search
Start Over
Efficient image analysis with triple attention vision transformer.
- Source :
-
Pattern Recognition . Jun2024, Vol. 150, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- This paper introduces TrpViT, a novel triple attention vision transformer that efficiently captures both local and global features. The proposed architecture tackles global information acquisition by employing three complementary attention mechanisms in a unique attention block: Window, Dilated, and Channel attention. This attention block extracts spatially local features while expanding the receptive field to capture richer global context. By integrating this attention block with convolution, a new C-C-T-T architecture is formed. We rigorously evaluate TrpViT, demonstrating state-of-the-art performance on various computer vision tasks, including image classification, 2D and 3D object detection, instance segmentation, and low-level image colorization. Notably, TrpViT achieves strong accuracy across all parameter scales, highlighting its computational efficiency and effectiveness. • A Triple Attention Vision Transformer captures both global and local features. • TrpViT integrates convolution into transformer to provide induction bias. • TrpViT compensates non-local information using multiple complementary attentions. • TrpViT achieves state-of-the-art results across both high-level and low-level tasks. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 00313203
- Volume :
- 150
- Database :
- Academic Search Index
- Journal :
- Pattern Recognition
- Publication Type :
- Academic Journal
- Accession number :
- 175963884
- Full Text :
- https://doi.org/10.1016/j.patcog.2024.110357