Back to Search Start Over

MultiSenseSeg: A Cost-Effective Unified Multimodal Semantic Segmentation Model for Remote Sensing

Authors :
Wang, Qingpeng
Chen, Wei
Huang, Zhou
Tang, Hongzhao
Yang, Lan
Source :
IEEE Transactions on Geoscience and Remote Sensing; 2024, Vol. 62 Issue: 1 p1-24, 24p
Publication Year :
2024

Abstract

Semantic segmentation is an essential technique in remote sensing. Until recently, most related research has focused primarily on advancing semantic segmentation models based on monomodal imagery, and less attention has been given to models that utilize multimodal remote sensing data. Moreover, most current multimodal approaches consider only limited bimodal situations and cannot simultaneously utilize three or more modalities. The increase in expensive computational costs associated with previous feature fusion paradigms hinders their application in broader cases. How to design a unified method to cover a wide variety of quantity-agnostic modalities for multimodal semantic segmentation remains an unsolved issue. To address the aforementioned challenges, this study explores a feasible way and proposes a cost-effective multimodal sensing semantic segmentation model (MultiSenseSeg). MultiSenseSeg employs multiple lightweight modality-specific experts (MSEs), an adaptive multimodal matching (AMM) module, and a single feature extraction pipeline to efficiently model intramodal and intermodal relationships. Benefiting from these designs, the proposed MultiSenseSeg can serve as a unified multimodal model capable of addressing both monomodal and bimodal cases and readily extrapolating to scenarios with more modalities, thereby achieving semantic segmentation of arbitrary quantities of multimodal data. To evaluate the performance of our method, we select several state-of-the-art (SOTA) semantic segmentation models from the past three years and conduct extensive experiments on two public multimodal datasets. The results show that MultiSenseSeg can not only achieve higher accuracy but also exhibit user-friendly modality extrapolation, allowing end-to-end training for consumer-grade users based on limited hardware resources. The model’s code will be available at <uri>https://github.com/W-qp/MultiSenseSeg</uri>.

Details

Language :
English
ISSN :
01962892 and 15580644
Volume :
62
Issue :
1
Database :
Supplemental Index
Journal :
IEEE Transactions on Geoscience and Remote Sensing
Publication Type :
Periodical
Accession number :
ejs66238457
Full Text :
https://doi.org/10.1109/TGRS.2024.3390750