Back to Search Start Over

Self-Supervised Pretraining With Monocular Height Estimation for Semantic Segmentation

Authors :
Xiong, Zhitong
Chen, Sining
Shi, Yilei
Zhu, Xiao Xiang
Source :
IEEE Transactions on Geoscience and Remote Sensing; 2024, Vol. 62 Issue: 1 p1-12, 12p
Publication Year :
2024

Abstract

Monocular height estimation (MHE) is key for generating 3-D city models, essential for swift disaster response. Moving beyond the traditional focus on performance enhancement, our study breaks new ground by probing the interpretability of MHE networks. We have pioneeringly discovered that neurons within MHE models demonstrate selectivity for both height and semantic classes. This insight sheds light on the complex inner workings of MHE models and inspires innovative strategies for leveraging elevation data more effectively. Informed by this insight, we propose a pioneering framework that employs MHE as a self-supervised pretraining method for remote sensing (RS) imagery. This approach significantly enhances the performance of semantic segmentation tasks. Furthermore, we develop a disentangled latent transformer (DLT) module that leverages explainable deep representations from pretrained MHE networks for unsupervised semantic segmentation. Our method demonstrates the significant potential of MHE tasks in developing foundation models for sophisticated pixel-level semantic analyses. Additionally, we present a new dataset designed to benchmark the performance of both semantic segmentation and height estimation tasks. The dataset and code will be publicly available at <uri>https://github.com/zhu-xlab/DLT-MHE.pytorch</uri>.

Details

Language :
English
ISSN :
01962892 and 15580644
Volume :
62
Issue :
1
Database :
Supplemental Index
Journal :
IEEE Transactions on Geoscience and Remote Sensing
Publication Type :
Periodical
Accession number :
ejs66997285
Full Text :
https://doi.org/10.1109/TGRS.2024.3412629