Compensating for Local Ambiguity With Encoder-Decoder in Urban Scene Segmentation.

Authors :: Tang, Quan
Liu, Fagui
Zhang, Tong
Jiang, Jun
Zhang, Yu
Zhu, Boyuan
Tang, Xuhao
Source :: IEEE Transactions on Intelligent Transportation Systems; Oct2022, Vol. 23 Issue 10, p19224-19235, 12p
Publication Year :: 2022
Abstract: Semantic segmentation plays a critical role in scene understanding for self-driving vehicles. A line of efforts has proven that global context matters in urban scene segmentation due to massive scale changes. However, we find that existing methods suffer from local ambiguities when dissipating continuous local context, i.e. scrambling to a huge receptive field of global cues by coarse pooling. To this end, this paper proposes a new Context Aggregation Module (CAM) that consists of two primary components: context encoding using no coarse pooling but encoder-decoders with appropriate sampling scales and gated fusion that extends gate attention mechanism to balance different-scale context during feature fusion. Weeding out coarse pooling and applying the encoder-decoder inherits the merits of exploring global context while avoiding the drawback of losing local contextual continuity. We then construct a Context Aggregation Network (CANet) and conduct extensive evaluations on challenging autonomous driving benchmarks of Cityscapes, CamVid and BDD100K. Consistently improved results evidence the effectiveness. Notably, we attain competitive mIoU 82.7% on Cityscapes and optimal mIoU 80.5% on CamVid. [ABSTRACT FROM AUTHOR]

Full Text Access

Tools