Start Over

SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers.

Authors :: Zhang, Bowen
Liu, Liyang
Phan, Minh Hieu
Tian, Zhi
Shen, Chunhua
Liu, Yifan
Source :: International Journal of Computer Vision. Apr2024, Vol. 132 Issue 4, p1126-1147. 22p.
Publication Year :: 2024
Abstract: This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder–decoder framework and introduce SegViTv2. In this study, we introduce a novel Attention-to-Mask (ATM) module to design a lightweight decoder effective for plain ViT. The proposed ATM converts the global attention map into semantic masks for high-quality segmentation results. Our decoder outperforms popular decoder UPerNet using various ViT backbones while consuming only about 5 % of the computational cost. For the encoder, we address the concern of the relatively high computational cost in the ViT-based encoders and propose a Shrunk++ structure that incorporates edge-aware query-based down-sampling (EQD) and query-based up-sampling (QU) modules. The Shrunk++ structure reduces the computational cost of the encoder by up to 50 % while maintaining competitive performance. Furthermore, we propose to adapt SegViT for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge. Experiments show that our proposed SegViTv2 surpasses recent segmentation methods on three popular benchmarks including ADE20k, COCO-Stuff-10k and PASCAL-Context datasets. The code is available through the following link: https://github.com/zbwxp/SegVit. [ABSTRACT FROM AUTHOR]

Subjects :: *TRANSFORMER models
*PLAINS
*MACHINE learning
*VIDEO coding
*AUTOMATED teller machines

Details

Language :: English
ISSN :: 09205691
Volume :: 132
Issue :: 4
Database :: Academic Search Index
Journal :: International Journal of Computer Vision
Publication Type :: Academic Journal
Accession number :: 176264577
Full Text :: https://doi.org/10.1007/s11263-023-01894-8

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources