Back to Search Start Over

Cross-Modality Data Augmentation for Aerial Object Detection with Representation Learning

Authors :
Chiheng Wei
Lianfa Bai
Xiaoyu Chen
Jing Han
Source :
Remote Sensing, Vol 16, Iss 24, p 4649 (2024)
Publication Year :
2024
Publisher :
MDPI AG, 2024.

Abstract

Data augmentation methods offer a cost-effective and efficient alternative to the acquisition of additional data, significantly enhancing data diversity and model generalization, making them particularly favored in object detection tasks. However, existing data augmentation techniques primarily focus on the visible spectrum and are directly applied to RGB-T object detection tasks, overlooking the inherent differences in image data between the two tasks. Visible images capture rich color and texture information during the daytime, while infrared images are capable of imaging under low-light complex scenarios during the nighttime. By integrating image information from both modalities, their complementary characteristics can be exploited to improve the overall effectiveness of data augmentation methods. To address this, we propose a cross-modality data augmentation method tailored for RGB-T object detection, leveraging masked image modeling within representation learning. Specifically, we focus on the temporal consistency of infrared images and combine them with visible images under varying lighting conditions for joint data augmentation, thereby enhancing the realism of the augmented images. Utilizing the masked image modeling method, we reconstruct images by integrating multimodal features, achieving cross-modality data augmentation in feature space. Additionally, we investigate the differences and complementarities between data augmentation methods in data space and feature space. Building upon existing theoretical foundations, we propose an integrative framework that combines these methods for improved augmentation effectiveness. Furthermore, we address the slow convergence observed with the existing Mosaic method in aerial imagery by introducing a multi-scale training strategy and proposing a full-scale Mosaic method as a complement. This optimization significantly accelerates network convergence. The experimental results validate the effectiveness of our proposed method and highlight its potential for further advancements in cross-modality object detection tasks.

Details

Language :
English
ISSN :
20724292
Volume :
16
Issue :
24
Database :
Directory of Open Access Journals
Journal :
Remote Sensing
Publication Type :
Academic Journal
Accession number :
edsdoj.9d17e64a16b74340a281a0740549b63e
Document Type :
article
Full Text :
https://doi.org/10.3390/rs16244649