Back to Search
Start Over
DSCAFormer: Lightweight Vision Transformer With Dual-Branch Spatial Channel Aggregation
- Source :
- IEEE Access, Vol 12, Pp 75272-75288 (2024)
- Publication Year :
- 2024
- Publisher :
- IEEE, 2024.
-
Abstract
- Vision Transformer (ViT) models with strong performance are widely popular in the vision field. However, they are often limited by computation and model size, making the study of lightweight models critical. Existing lightweight ViT models often exhibit a weak dependence on high-frequency information and lack an effective multi-frequency fusion mechanism. This limitation typically necessitates the use of complex computational resources and the expansion of model scale to compensate for these shortcomings. In response to this situation, this paper proposes the DSCAFormer model, which develops a novel architecture that includes a local convolutional block with attention-like algorithms and a global spatial transformer block. These components use a channel splitting strategy with a gating mechanism to merge local and global information, forming a multi-frequency spatial feature extractor capable of effectively integrating both local and global information from images. Furthermore, a channel aggregation method is introduced to enhance the extraction of spatial information within the channel space, thereby enabling spatial feature perception within the context and the allocation of multi-feature computation. A series of comprehensive experiments were conducted in the domains of classification, detection, and instance segmentation, which demonstrated the DSCAFormer model’s remarkable scalability and competitiveness. For instance, on the ImageNet-1K dataset, the DSCAFormer model, which utilized 2.5M, 4.3M, and 7.4M parameters, achieved top-1 accuracies of 72.5%, 76.7%, and 79.5%, respectively, with 0.4 G, 0.7G, and 1.2G FLOPs.These results outperform the MobileViTv2-0.5/0.75/1.0 models, with respective accuracy improvements of 2.3%, 1.1%, and 1.4%, and reductions in FLOPs of 20%, 30%, and 33%. In addition, DSCAFormer has also shown competitive performance in downstream tasks.
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 12
- Database :
- Directory of Open Access Journals
- Journal :
- IEEE Access
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.f6ed0f17632a4cc5b38affde0e31af27
- Document Type :
- article
- Full Text :
- https://doi.org/10.1109/ACCESS.2024.3406555