Back to Search Start Over

Survey on Visual Transformer for Image Classification

Authors :
PENG Bin, BAI Jing, LI Wenjing, ZHENG Hu, MA Xiangyu
Source :
Jisuanji kexue yu tansuo, Vol 18, Iss 2, Pp 320-344 (2024)
Publication Year :
2024
Publisher :
Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press, 2024.

Abstract

Transformer is a deep learning model based on the self-attention mechanism, showing tremendous potential in computer vision. In image classification tasks, the key challenge lies in efficiently and accurately capturing both local and global features of input images. Traditional approaches rely on convolutional neural networks to extract local features at the lower layers, expanding the receptive field through stacked convolutional layers to obtain global features. However, this strategy aggregates information over relatively short distances, making it difficult to model long-term dependencies. In contrast, the self-attention mechanism of Transformer directly compares features across all spatial positions, capturing long-range dependencies at both local and global levels and exhibiting stronger global modeling capabilities. Therefore, a thorough exploration of the challenges faced by Transformer in image classification tasks is crucial. Taking Vision Transformer as an example, this paper provides a detailed overview of the core principles and architecture of Transformer. It then focuses on image classification tasks, summarizing key issues and recent advancements in visual Transformer research related to performance enhancement, computational costs, and training optimization. Furthermore, applications of Transformer in specific domains such as medical imagery, remote sensing, and agricultural images are summarized, highlighting its versatility and generality. Finally, a comprehensive analysis of the research progress in visual Transformer for image classification is presented, offering insights into future directions for the development of visual Transformer.

Details

Language :
Chinese
ISSN :
16739418
Volume :
18
Issue :
2
Database :
Directory of Open Access Journals
Journal :
Jisuanji kexue yu tansuo
Publication Type :
Academic Journal
Accession number :
edsdoj.0710fe2967b4e13905feb61ed449804
Document Type :
article
Full Text :
https://doi.org/10.3778/j.issn.1673-9418.2310092