Back to Search
Start Over
Cross-Modality Data Augmentation for Aerial Object Detection with Representation Learning
- Source :
- Remote Sensing, Vol 16, Iss 24, p 4649 (2024)
- Publication Year :
- 2024
- Publisher :
- MDPI AG, 2024.
-
Abstract
- Data augmentation methods offer a cost-effective and efficient alternative to the acquisition of additional data, significantly enhancing data diversity and model generalization, making them particularly favored in object detection tasks. However, existing data augmentation techniques primarily focus on the visible spectrum and are directly applied to RGB-T object detection tasks, overlooking the inherent differences in image data between the two tasks. Visible images capture rich color and texture information during the daytime, while infrared images are capable of imaging under low-light complex scenarios during the nighttime. By integrating image information from both modalities, their complementary characteristics can be exploited to improve the overall effectiveness of data augmentation methods. To address this, we propose a cross-modality data augmentation method tailored for RGB-T object detection, leveraging masked image modeling within representation learning. Specifically, we focus on the temporal consistency of infrared images and combine them with visible images under varying lighting conditions for joint data augmentation, thereby enhancing the realism of the augmented images. Utilizing the masked image modeling method, we reconstruct images by integrating multimodal features, achieving cross-modality data augmentation in feature space. Additionally, we investigate the differences and complementarities between data augmentation methods in data space and feature space. Building upon existing theoretical foundations, we propose an integrative framework that combines these methods for improved augmentation effectiveness. Furthermore, we address the slow convergence observed with the existing Mosaic method in aerial imagery by introducing a multi-scale training strategy and proposing a full-scale Mosaic method as a complement. This optimization significantly accelerates network convergence. The experimental results validate the effectiveness of our proposed method and highlight its potential for further advancements in cross-modality object detection tasks.
- Subjects :
- data augmentation
cross-modality
object detection
representation learning
Science
Subjects
Details
- Language :
- English
- ISSN :
- 20724292
- Volume :
- 16
- Issue :
- 24
- Database :
- Directory of Open Access Journals
- Journal :
- Remote Sensing
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.9d17e64a16b74340a281a0740549b63e
- Document Type :
- article
- Full Text :
- https://doi.org/10.3390/rs16244649