Back to Search
Start Over
In-Domain Supervised and Contrastive Self-Supervised Representation Learning for Dense Prediction Problems in Remote Sensing Imageries
- Source :
- IEEE Access, Vol 12, Pp 183510-183524 (2024)
- Publication Year :
- 2024
- Publisher :
- IEEE, 2024.
-
Abstract
- Recent advancements in convolutional neural networks have improved computer vision applications, including satellite imagery analysis. However, the lack of large labeled datasets and the complexity of remote sensing tasks render supervised learning methods less effective. While ImageNet pre-trained models have been used to address this, the domain difference between natural and satellite images poses significant limitations. These facts motivate us to explore both supervised and self-supervised learning to capture in-domain visual representations from satellite images, address the domain difference with ImageNet, and reduce the need for labeled datasets and computational resources. Furthermore, our research endeavors to identify the effective characteristics that make a dataset a suitable candidate for representation learning in the satellite imagery domain. The importance of choosing the right pre-training dataset cannot be overstated; it directly influences model performance and generalization capabilities. Given the plethora of available datasets in this field, selecting an appropriate one is fraught with difficulty, as each dataset varies in terms of quality, resolution, and relevance to specific tasks. The obtained weights from proper datasets serve as the initial weights for segmentation and object detection models. In terms of self-supervised pre-training, the SimSiam algorithm, employing the ResNet50 backbone, was utilized. Our results underscore that selecting a dataset with high spatial resolution is crucial, as it significantly enhances feature learning and improves model performance in remote sensing applications. This study explores the impact of hierarchical pre-training on dense prediction tasks, initially utilizing public datasets such as ImageNet, followed by in-domain datasets. Our systematic approach yielded significant improvements in performance metrics, enhancing the mean Intersection over Union (mIOU) score and pixel accuracy by 4.06% and 9.62%, respectively, when compared to existing literature. Furthermore, relative to our baseline model, the proposed methodology achieved enhancements of 2.1% in mIOU and 0.88% in pixel accuracy for semantic segmentation tasks on the DeepGlobe Land Cover Classification dataset, surpassing the efficacy of conventional ImageNet pre-trained weights. Additionally, we modified the DeepLabv3 architecture by re-implementing it to facilitate the transfer of previously trained weights to its backbone. To enhance convergence speed, we set the overall output stride to 32. In object detection tasks, including those involving Oil Tank Storage and Airplanes, the mean Average Precision (mAP) metric exhibited enhancements of 2.65% and 1.47%, respectively. This finding indicates that selecting an appropriate dataset for pre-training visual representations can significantly enhance the efficacy of remote sensing image analysis. Consequently, our proposed method emerges as a promising and practical approach for advancing this field.
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 12
- Database :
- Directory of Open Access Journals
- Journal :
- IEEE Access
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.6515b6d3af7a42b7b741362dcf010dd0
- Document Type :
- article
- Full Text :
- https://doi.org/10.1109/ACCESS.2024.3510779