Back to Search Start Over

Sparse Attention Module for optimizing semantic segmentation performance combined with a multi-task feature extraction network.

Authors :
Jiang, Min
Zhai, Fuhao
Kong, Jun
Source :
Visual Computer. Jul2022, Vol. 38 Issue 7, p2473-2488. 16p.
Publication Year :
2022

Abstract

In the task of semantic segmentation, researchers often use self-attention module to capture long-range contextual information. These methods are often effective. However, the use of the self-attention module will cause a problem that cannot be ignored, that is, the huge consumption of computing resources. Therefore, how to reduce the resource consumption of the self-attention module under the premise of ensuring performance is a very meaningful research topic. In this paper, we propose a Sparse Attention Model combined with a powerful multi-task feature extraction network for semantic segmentation. Compared with the classic self-attention model, our Sparse Attention Model does not calculate the inner product between pairs of all vectors. Instead, we first sparse the feature block Query and the feature block Key defined in self-attention module through the credit matrix generated by the pre-output. Then, we perform similarity modeling on the two sparse feature blocks. Meanwhile, to ensure that the vectors in Query could capture dense contextual information, we design a Class Attention Module and embed it into Sparse Attention Module. Note that, compared with Dual Attention Network for scene segmentation, our attention module greatly reduces the consumption of computing resources while ensuring the accuracy. Furthermore, in the stage of feature extraction, the use of downsampling will cause serious loss of detailed information and affect the segmentation performance of the network, so we adopt a multi-task feature extraction network. It learns semantic features and edge features in parallel, and we feed the learned edge features into the deep layer of the network to help restore detailed information for capturing high-quality semantic features. We do not use pure concatenation. Instead, we extract the edge features related to each channel by element-wise multiplication before concatenation. Finally, we conduct experiments on three datasets: Cityscapes, PASCAL VOC2012 and ADE20K, and obtain competitive results. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01782789
Volume :
38
Issue :
7
Database :
Academic Search Index
Journal :
Visual Computer
Publication Type :
Academic Journal
Accession number :
157319294
Full Text :
https://doi.org/10.1007/s00371-021-02124-3