1,233 results on '"scene classification"'
Search Results
2. RCSFN: A remote sensing image scene classification and recognition network based on rectangle convolutional self attention fusion.
- Author
-
Hou, Jingjin, Zhou, Houkui, Yu, Huimin, and Hu, Haoji
- Abstract
Remote sensing scene classification is a critical task in the processing and analysis of remote sensing images. Traditional methods typically use standard convolutional kernels to extract feature information. Although these methods have seen improvements, they still struggle to fully capture unique local details, thus affecting classification accuracy. Each category within remote sensing scenes has its unique local details, such as the rectangular features of buildings in schools or industrial areas, as well as bridges and roads in parks or squares. The most important features are often these rectangular structures and their spatial positions, which standard convolutional kernels find challenging to capture effectively.To address this issue, we propose a remote sensing scene classification method based on a Rectangle Convolution Self-Attention Fusion Network (RCSFN) architecture. In the RCSFN network, the Rectangle Convolution Maximum Fusion (RCMF) module operates in parallel with the first 4 × 4 convolutional layer of VanillaNet-5. The RCMF module uses two different rectangular convolutional kernels to extract different receptive fields, enhancing the extraction of shallow local features through addition and fusion. This process, combined with the concatenation of the original input features, results in richer local detail information.Additionally, we introduce an Area Selection (AS) module that focuses on selecting feature information within local regions. The Sequential Polarisation Self-Attention (SPS) mechanism, integrated with the Mini Region Convolution (MRC) module through feature multiplication, enhances important features and improves spatial positional relationships, thereby increasing the accuracy of recognising categories with rectangular or elongated features. Experiments were carried out on AID and NWPU-RESISC45 data sets, and the overall classification accuracy was 96.56% and 92.46%, respectively. This shows that the RCSFN network model proposed in this paper is feasible and effective for class classification problems with unique local detail features. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Multi-Scale and Multi-Network Deep Feature Fusion for Discriminative Scene Classification of High-Resolution Remote Sensing Images.
- Author
-
Yuan, Baohua, Sehra, Sukhjit Singh, and Chiu, Bernard
- Subjects
- *
CONVOLUTIONAL neural networks , *DISCRIMINANT analysis , *FEATURE extraction , *REMOTE-sensing images , *REMOTE sensing - Abstract
The advancement in satellite image sensors has enabled the acquisition of high-resolution remote sensing (HRRS) images. However, interpreting these images accurately and obtaining the computational power needed to do so is challenging due to the complexity involved. This manuscript proposed a multi-stream convolutional neural network (CNN) fusion framework that involves multi-scale and multi-CNN integration for HRRS image recognition. The pre-trained CNNs were used to learn and extract semantic features from multi-scale HRRS images. Feature extraction using pre-trained CNNs is more efficient than training a CNN from scratch or fine-tuning a CNN. Discriminative canonical correlation analysis (DCCA) was used to fuse deep features extracted across CNNs and image scales. DCCA reduced the dimension of the features extracted from CNNs while providing a discriminative representation by maximizing the within-class correlation and minimizing the between-class correlation. The proposed model has been evaluated on NWPU-RESISC45 and UC Merced datasets. The accuracy associated with DCCA was 10% and 6% higher than discriminant correlation analysis (DCA) in the NWPU-RESISC45 and UC Merced datasets. The advantage of DCCA was better demonstrated in the NWPU-RESISC45 dataset due to the incorporation of richer within-class variability in this dataset. While both DCA and DCCA minimize between-class correlation, only DCCA maximizes the within-class correlation and, therefore, attains better accuracy. The proposed framework achieved higher accuracy than all state-of-the-art frameworks involving unsupervised learning and pre-trained CNNs and 2–3% higher than the majority of fine-tuned CNNs. The proposed framework offers computational time advantages, requiring only 13 s for training in NWPU-RESISC45, compared to a day for fine-tuning the existing CNNs. Thus, the proposed framework achieves a favourable balance between efficiency and accuracy in HRRS image recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. 自动驾驶场景类间相似特征自适应分类网络.
- Author
-
姜彦吉, 冯宇宙, 董 浩, and 田佳琳
- Abstract
Copyright of Journal of Frontiers of Computer Science & Technology is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
5. A New Scene Sensing Model Based on Multi-Source Data from Smartphones.
- Author
-
Ding, Zhenke, Deng, Zhongliang, Hu, Enwen, Liu, Bingxun, Zhang, Zhichao, and Ma, Mingyang
- Subjects
- *
CONVOLUTIONAL neural networks , *METAHEURISTIC algorithms , *MULTISENSOR data fusion , *ARTIFICIAL satellites in navigation , *GLOBAL Positioning System , *SENSOR networks - Abstract
Smartphones with integrated sensors play an important role in people's lives, and in advanced multi-sensor fusion navigation systems, the use of individual sensor information is crucial. Because of the different environments, the weights of the sensors will be different, which will also affect the method and results of multi-source fusion positioning. Based on the multi-source data from smartphone sensors, this study explores five types of information—Global Navigation Satellite System (GNSS), Inertial Measurement Units (IMUs), cellular networks, optical sensors, and Wi-Fi sensors—characterizing the temporal, spatial, and mathematical statistical features of the data, and it constructs a multi-scale, multi-window, and context-connected scene sensing model to accurately detect the environmental scene in indoor, semi-indoor, outdoor, and semi-outdoor spaces, thus providing a good basis for multi-sensor positioning in a multi-sensor navigation system. Detecting environmental scenes provides an environmental positioning basis for multi-sensor fusion localization. This model is divided into four main parts: multi-sensor-based data mining, a multi-scale convolutional neural network (CNN), a bidirectional long short-term memory (BiLSTM) network combined with contextual information, and a meta-heuristic optimization algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Instance-Level Scaling and Dynamic Margin-Alignment Knowledge Distillation for Remote Sensing Image Scene Classification †.
- Author
-
Li, Chuan, Teng, Xiao, Ding, Yan, and Lan, Long
- Subjects
- *
ARTIFICIAL neural networks , *IMAGE recognition (Computer vision) , *DEEP learning , *REMOTE sensing , *DISTILLATION - Abstract
Remote sensing image (RSI) scene classification aims to identify semantic categories in RSI using neural networks. However, high-performance deep neural networks typically demand substantial storage and computational resources, making practical deployment challenging. Knowledge distillation has emerged as an effective technique for developing compact models that maintain high classification accuracy in RSI tasks. Existing knowledge distillation methods often overlook the high inter-class similarity in RSI scenes, leading to low-confidence soft labels from the teacher model, which can mislead the student model. Conversely, overly confident soft labels may discard valuable non-target information. Additionally, the significant intra-class variability in RSI contributes to instability in the model's decision boundaries. To address these challenges, we propose an efficient method called instance-level scaling and dynamic margin-alignment knowledge distillation (ISDM) for RSI scene classification. To balance the target and non-target class influence, we apply an entropy regularization loss to scale the teacher model's target class at the instance level. Moreover, we introduce dynamic margin alignment between the student and teacher models to improve the student's discriminative capability. By optimizing soft labels and enhancing the student's ability to distinguish between classes, our method reduces the effects of inter-class similarity and intra-class variability. Experimental results on three public RSI scene classification datasets (AID, UCMerced, and NWPU-RESISC) demonstrate that our method achieves state-of-the-art performance across all teacher–student pairs with lower computational costs. Additionally, we validate the generalization of our approach on general datasets, including CIFAR-100 and ImageNet-1k. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Deep and shallow feature fusion framework for remote sensing open pit coal mine scene recognition.
- Author
-
Liu, Yang and Zhang, Jin
- Abstract
Understanding land use and damage in open-pit coal mining areas is crucial for effective scientific oversight and management. Current recognition methods exhibit limitations: traditional approaches depend on manually designed features, which offer limited expressiveness, whereas deep learning techniques are heavily reliant on sample data. In order to overcome the aforementioned limitations, a three-branch feature extraction framework was proposed in the present study. The proposed framework effectively fuses deep features (DF) and shallow features (SF), and can accomplish scene recognition tasks with high accuracy and fewer samples. Deep features are enhanced through a neighbouring feature attention module and a Graph Convolutional Network (GCN) module, which capture both neighbouring features and the correlation between local scene information. Shallow features are extracted using the Gray-Level Co-occurrence Matrix (GLCM) and Gabor filters, which respectively capture local and overall texture variations. Evaluation results on the AID and RSSCN7 datasets demonstrate that the proposed deep feature extraction model achieved classification accuracies of 97.53% and 96.73%, respectively, indicating superior performance in deep feature extraction tasks. Finally, the two kinds of features were fused and input into the particle swarm algorithm optimised support vector machine (PSO-SVM) to classify the scenes of remote sensing images, and the classification accuracy reached 92.78%, outperforming four other classification methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Pseudo-label meta-learner in semi-supervised few-shot learning for remote sensing image scene classification.
- Author
-
Miao, Wang, Huang, Kai, Xu, Zhe, Zhang, Jianting, Geng, Jie, and Jiang, Wen
- Subjects
SUPERVISED learning ,IMAGE recognition (Computer vision) ,REMOTE sensing ,MACHINE learning ,KNOWLEDGE representation (Information theory) - Abstract
Remote sensing image scene classification (RSISC) greatly benefits from the use of few-shot learning, as it enables the recognition of novel scenes with only a small amount of labeled data. Most previous works focused on learning representations of prior knowledge with scarce labeled data while ignoring the feasibility of using potential information with large amounts of unlabeled data. In this paper, we introduce a novel semi-supervised few-shot pseudo-label propagation method through the introduction of unlabeled knowledge. This approach utilizes the pseudo-loss property generated by the classifier to indirectly reflect the credibility of pseudo-labeled samples. Therefore, we propose a semi-supervised pseudo-loss confidence metric-based method called a pseudolabel meta-learner (PLML) for RSISC. Specifically, we adopt a pseudoloss estimation module to map the pseudo-labeled data obtained from different tasks to a unified pseudo-loss metric space. Then, the distributions of the pseudolosses with both correct and incorrect pseudolabels are fitted by a semi-supervised beta mixture model (ss-BMM). This model can iteratively select high-quality unlabeled data to enhance the self-training effect of the classifier. Finally, to address the problem of shifting pseudo-loss distributions in remote sensing images, a progressive self-training strategy is proposed to mitigate the cumulative error induced by the classifier. Experimental results demonstrate that our proposed PLML approach outperforms the existing alternatives on the NWPU-RESISC45, AID, and UC Merced datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Semi-Supervised Subcategory Centroid Alignment-Based Scene Classification for High-Resolution Remote Sensing Images †.
- Author
-
Mo, Nan and Zhu, Ruixi
- Subjects
- *
REMOTE sensing , *IMAGE representation , *KNOWLEDGE transfer , *CENTROID , *CLASSIFICATION - Abstract
It is usually hard to obtain adequate annotated data for delivering satisfactory scene classification results. Semi-supervised scene classification approaches can transfer the knowledge learned from previously annotated data to remote sensing images with scarce samples for satisfactory classification results. However, due to the differences between sensors, environments, seasons, and geographical locations, cross-domain remote sensing images exhibit feature distribution deviations. Therefore, semi-supervised scene classification methods may not achieve satisfactory classification accuracy. To address this problem, a novel semi-supervised subcategory centroid alignment (SSCA)-based scene classification approach is proposed. The SSCA framework is made up of two components, namely the rotation-robust convolutional feature extractor (RCFE) and the neighbor-based subcategory centroid alignment (NSCA). The RCFE aims to suppress the impact of rotation changes on remote sensing image representation, while the NSCA aims to decrease the impact of intra-category variety across domains on cross-domain scene classification. The SSCA algorithm and several competitive approaches are validated on two datasets to demonstrate its effectiveness. The results prove that the proposed SSCA approach performs better than most competitive approaches by no less than 2% overall accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. ADC-CPANet:一种局部—全局特征融合的 遥感图像分类方法.
- Author
-
王, 威, 李, 希杰, and 王, 新
- Subjects
CONVOLUTIONAL neural networks ,IMAGE recognition (Computer vision) ,COMPUTER vision ,FEATURE extraction ,REMOTE sensing ,DEEP learning - Abstract
Copyright of Journal of Remote Sensing is the property of Editorial Office of Journal of Remote Sensing & Science Publishing Co. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
11. Adaptive Classification Network for Similar Features Between Classes in Automatic Driving Scenarios
- Author
-
JIANG Yanji, FENG Yuzhou, DONG Hao, TIAN Jialin
- Subjects
autonomous driving ,scene classification ,inter-class similarity ,multi-scale structure ,feature screening ,adaptive training ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Addressing the issue of inter-class similarity is a challenging task in the research of autonomous driving scene classification, which primarily focuses on learning the distinctive features of targets in real-world complex traffic scenarios with high similarity, and constructing the overall correlation between features for scene classification. To this end, a multi-scale adaptive feature selection network for autonomous driving scene classification is proposed. Initially, a dual multi-scale feature extraction module is utilized for preliminary processing to extract inter-class similar features at different scales. Subsequently, a feature differentiation screening module is designed to complete the screening of scene-similar features, enabling the network to focus more on the typical and easily distinguishable features of different scene categories. Then, the feature screening results and multi-scale feature maps are transferred to the feature fusion classification module for scene classification, and the correlation between scene features is captured. Finally, an adaptive learning algorithm dynamically adjusts the training parameters through the output results, accelerating the network's convergence speed and improving accuracy. The proposed method is compared with existing network methods on three datasets: BDD100k, BDD100k+ and self-made dataset. Compared with the Top2 networks, it leads in accuracy by 3.29%, 5.59% and 12.65% (relatively), respectively. Experimental results demonstrate the effectiveness of the proposed method and its strong generalization capability. The scene classification method presented in this paper aims to learn the typical and easily distinguishable features and their correlations under different complex scene categories, reducing the impact of inter-class similarity among multiple targets, thereby making the scene classification results in real-world traffic scenario datasets more accurate.
- Published
- 2024
- Full Text
- View/download PDF
12. Deep and shallow feature fusion framework for remote sensing open pit coal mine scene recognition
- Author
-
Yang Liu and Jin Zhang
- Subjects
Feature fusion ,Graph convolutional network (GCN) ,Remote sensing (RS) ,Scene classification ,Medicine ,Science - Abstract
Abstract Understanding land use and damage in open-pit coal mining areas is crucial for effective scientific oversight and management. Current recognition methods exhibit limitations: traditional approaches depend on manually designed features, which offer limited expressiveness, whereas deep learning techniques are heavily reliant on sample data. In order to overcome the aforementioned limitations, a three-branch feature extraction framework was proposed in the present study. The proposed framework effectively fuses deep features (DF) and shallow features (SF), and can accomplish scene recognition tasks with high accuracy and fewer samples. Deep features are enhanced through a neighbouring feature attention module and a Graph Convolutional Network (GCN) module, which capture both neighbouring features and the correlation between local scene information. Shallow features are extracted using the Gray-Level Co-occurrence Matrix (GLCM) and Gabor filters, which respectively capture local and overall texture variations. Evaluation results on the AID and RSSCN7 datasets demonstrate that the proposed deep feature extraction model achieved classification accuracies of 97.53% and 96.73%, respectively, indicating superior performance in deep feature extraction tasks. Finally, the two kinds of features were fused and input into the particle swarm algorithm optimised support vector machine (PSO-SVM) to classify the scenes of remote sensing images, and the classification accuracy reached 92.78%, outperforming four other classification methods.
- Published
- 2024
- Full Text
- View/download PDF
13. Human-Annotated Label Noise and Their Impact on ConvNets for Remote Sensing Image Scene Classification
- Author
-
Longkang Peng, Tao Wei, Xuehong Chen, Xiaobei Chen, Rui Sun, Luoma Wan, Jin Chen, and Xiaolin Zhu
- Subjects
Convolutional neural network (ConvNet) ,human-annotated label noise ,label noise ,remote sensing ,scene classification ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Human-labeled training datasets are essential for convolutional neural networks (ConvNets) in satellite image scene classification. Annotation errors are unavoidable due to the complexity of satellite images. However, the distribution of real-world human-annotated label noises on satellite images and their impact on ConvNets have not been investigated. To fill this research gap, this article, for the first time, collected real-world labels from 32 participants and explored how their annotated label noise affects three representative ConvNets (VGG16, GoogleNet, and ResNet-50) for remote sensing image scene classification. We found that 1) human-annotated label noise exhibits significant class and instance dependence; 2) an additional 1% of human-annotated label noise in training data leads to a 0.5% reduction in the overall accuracy of ConvNets classification; and 3) the error pattern of ConvNet predictions was strongly correlated with that of participant's labels. To uncover the mechanism underlying the impact of human labeling errors on ConvNets, we compared it with three types of simulated label noise: uniform noise, class-dependent noise, and instance-dependent noise. Our results show that the impact of human-annotated label noise on ConvNets significantly differs from all three types of simulated label noise, while both class dependence and instance dependence contribute to the impact of human-annotated label noise on ConvNets. Additionally, the label noise estimation algorithm (confident learning) cannot fully identify label noise. These observations necessitate a reevaluation of the handling of noisy labels, and we anticipate that our real-world label noise dataset would facilitate the future development and assessment of label-noise learning algorithms.
- Published
- 2025
- Full Text
- View/download PDF
14. Smart City Community Watch—Camera-Based Community Watch for Traffic and Illegal Dumping
- Author
-
Nupur Pathak, Gangotri Biswal, Megha Goushal, Vraj Mistry, Palak Shah, Fenglian Li, and Jerry Gao
- Subjects
smart city ,illegal dumping ,video surveillance ,computer vision ,scene classification ,deep learning ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
The United States is the second-largest waste generator in the world, generating 4.9 pounds (2.2 kg) of Municipal Solid Waste (MSW) per person each day. The excessive amount of waste generated poses serious health and environmental risks, especially because of the prevalence of illegal dumping practices, including improper waste disposal in unauthorized areas. To clean up illegal dumping, the government spends approximately USD 600 per ton, which amounts to USD 178 billion per year. Municipalities face a critical challenge to detect and prevent illegal dumping activities. Current techniques to detect illegal dumping have limited accuracy in detection and do not support an integrated solution of detecting dumping, identifying the vehicle, and a decision algorithm notifying the municipalities in real-time. To tackle this issue, an innovative solution has been developed, utilizing a You Only Look Once (YOLO) detector YOLOv5 for detecting humans, vehicles, license plates, and trash. The solution incorporates DeepSORT for effective identification of illegal dumping by analyzing the distance between a human and the trash’s bounding box. It achieved an accuracy of 97% in dumping detection after training on real-time examples and the COCO dataset covering both daytime and nighttime scenarios. This combination of YOLOv5, DeepSORT, and the decision module demonstrates robust capabilities in detecting dumping. The objective of this web-based application is to minimize the adverse effects on the environment and public health. By leveraging advanced object detection and tracking techniques, along with a user-friendly web application, it aims to promote a cleaner, healthier environment for everyone by reducing improper waste disposal.
- Published
- 2024
- Full Text
- View/download PDF
15. Investigating the use of deep learning models for land cover classification from street‐level imagery.
- Author
-
Tsutsumida, Narumasa, Zhao, Jing, Shibuya, Naho, Nasahara, Kenlo, and Tadono, Takeo
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *LAND cover , *DEEP learning , *REMOTE-sensing images - Abstract
Land cover classification mapping is the process of assigning labels to different types of land surfaces based on overhead imagery. However, acquiring reference samples through fieldwork for ground truth can be costly and time‐intensive. Additionally, annotating high‐resolution satellite images poses challenges, as certain land cover types are difficult to discern solely from nadir images. To address these challenges, this study examined the feasibility of using street‐level imagery to support the collection of reference samples and identify land cover. We utilized 18,022 images captured in Japan, with 14 different land cover classes. Our approach involved using convolutional neural networks based on Inception‐v4 and DenseNet, as well as Transformer‐based Vision and Swin Transformers, both with and without pre‐trained weights and fine‐tuning techniques. Additionally, we explored explainability through Gradient‐Weighted Class Activation Mapping (Grad‐CAM). Our results indicate that using a Vision Transformer was the most effective method, achieving an overall accuracy of 86.12% and allowing for full explainability of land cover targets within an image. This paper proposes a promising solution for land cover classification from street‐level imagery, which can be used for semi‐automatic reference sample collection from geo‐tagged street‐level photos. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Scene representation using a new two-branch neural network model.
- Author
-
Parseh, Mohammad Javad, Rahmanimanesh, Mohammad, Keshavarzi, Parviz, and Azimifar, Zohreh
- Subjects
- *
ARTIFICIAL neural networks , *IMAGE representation , *DEEP learning , *COMPUTER vision , *RECOGNITION (Psychology) , *FEATURE extraction , *CONVOLUTIONAL neural networks - Abstract
Scene classification and recognition have always been one of the most challenging tasks of scene understanding due to the inherent ambiguity in visual scenes. The core of scene classification and recognition tasks is scene representation. Deep learning advances in computer vision, especially deep CNNs, have significantly improved scene representation in the last decade. Deep convolutional features extracted from deep CNNs provide discriminative representations of the images and are widely used in various computer vision tasks, such as scene classification. Deep convolutional features capture the appearance characteristics of the image and the spatial information about different image regions. Meanwhile, the semantic and context information obtained from high-level concepts about scene images, such as objects and their relationships, can significantly contribute to identifying scene images. Therefore, in this paper, we divide visual scenes into two categories, object-based and layout-based. Object-based scenes are scenes that have scene-specific objects and, based on those objects, can be described and identified. In contrast, the layout-based scenes do not have scene-specific objects and are described and identified based on the appearance and layout of the image. This paper proposes a new neural network model for representing and classifying visual scenes, which we call G-CNN (GNN-CNN). The proposed model includes two modules, feature extraction and feature fusion, and the feature extraction module composes of visual and semantic branches. The visual branch is responsible for extracting deep CNN features from the image, and the semantic branch is responsible for extracting semantic GNN features from the scene graph corresponding to the image. The feature fusion module is a novel two-stream neural network that fuses the CNN and GNN feature vectors to produce a comprehensive representation of the scene image. Finally, a fully-connected classifier classified the obtained comprehensive feature vector into one of the pre-defined categories. The proposed model has been evaluated on three benchmark scene datasets, UIUC Sports, MIT67, and SUN397, and obtained classification accuracy of 99.91%, 96.01%, and 85.32%, respectively. In addition, a new dataset named Scene40, which has been introduced in our previous paper, is also used for further evaluation of the proposed method. The comparison results based on classification accuracy criteria show that the proposed model can outperform the best previous methods on three benchmark scene datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Q-A 2 NN: Quantized All-Adder Neural Networks for Onboard Remote Sensing Scene Classification.
- Author
-
Zhang, Ning, Chen, He, Chen, Liang, Wang, Jue, Wang, Guoqing, and Liu, Wenchao
- Subjects
- *
REMOTE sensing , *CONVOLUTIONAL neural networks , *CLASSIFICATION - Abstract
Performing remote sensing scene classification (RSSC) directly on satellites can alleviate data downlink burdens and reduce latency. Compared to convolutional neural networks (CNNs), the all-adder neural network (A2NN) is a novel basic neural network that is more suitable for onboard RSSC, enabling lower computational overhead by eliminating multiplication operations in convolutional layers. However, the extensive floating-point data and operations in A2NNs still lead to significant storage overhead and power consumption during hardware deployment. In this article, a shared scaling factor-based de-biasing quantization (SSDQ) method tailored for the quantization of A2NNs is proposed to address this issue, including a powers-of-two (POT)-based shared scaling factor quantization scheme and a multi-dimensional de-biasing (MDD) quantization strategy. Specifically, the POT-based shared scaling factor quantization scheme converts the adder filters in A2NNs to quantized adder filters with hardware-friendly integer input activations, weights, and operations. Thus, quantized A2NNs (Q-A2NNs) composed of quantized adder filters have lower computational and memory overheads than A2NNs, increasing their utility in hardware deployment. Although low-bit-width Q-A2NNs exhibit significantly reduced RSSC accuracy compared to A2NNs, this issue can be alleviated by employing the proposed MDD quantization strategy, which combines a weight-debiasing (WD) strategy, which reduces performance degradation due to deviations in the quantized weights, with a feature-debiasing (FD) strategy, which enhances the classification performance of Q-A2NNs through minimizing deviations among the output features of each layer. Extensive experiments and analyses demonstrate that the proposed SSDQ method can efficiently quantize A2NNs to obtain Q-A2NNs with low computational and memory overheads while maintaining comparable performance to A2NNs, thus having high potential for onboard RSSC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. 基于注意力网络尺度特征融合的遥感场景分类.
- Author
-
帖军, 肖鹏飞, 郑禄, 马海荣, and 彭丹
- Abstract
Copyright of Journal of South-Central Minzu University (Natural Science Edition) is the property of Journal of South-Central Minzu University (Natural Science Edition) Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
19. Self-supervised in-domain representation learning for remote sensing image scene classification
- Author
-
Ali Ghanbarzadeh and Hossein Soleimani
- Subjects
Transfer learning ,Deep learning ,Remote sensing ,Self-supervised learning ,Representation learning ,Scene classification ,Science (General) ,Q1-390 ,Social sciences (General) ,H1-99 - Abstract
Transferring the ImageNet pre-trained weights to the various remote sensing tasks has produced acceptable results and reduced the need for labeled samples. However, the domain differences between ground imageries and remote sensing images cause the performance of such transfer learning to be limited. The difficulty of annotating remote sensing images is well-known as it requires domain experts and more time, whereas unlabeled data is readily available. Recently, self-supervised learning, which is a subset of unsupervised learning, emerged and significantly improved representation learning. Recent research has demonstrated that self-supervised learning methods capture visual features that are more discriminative and transferable than the supervised ImageNet weights. We are motivated by these facts to pre-train the in-domain representations of remote sensing imagery using contrastive self-supervised learning and transfer the learned features to other related remote sensing datasets. Specifically, we used the SimSiam algorithm to pre-train the in-domain knowledge of remote sensing datasets and then transferred the obtained weights to the other scene classification datasets. Thus, we have obtained state-of-the-art results on five land cover classification datasets with varying numbers of classes and spatial resolutions. In addition, by conducting appropriate experiments, including feature pre-training using datasets with different attributes, we have identified the most influential factors that make a dataset a good choice for obtaining in-domain features. We have transferred the features obtained by pre-training SimSiam on remote sensing datasets to various downstream tasks and used them as initial weights for fine-tuning. Moreover, we have linearly evaluated the obtained representations in cases where the number of samples per class is limited. Our experiments have demonstrated that using a higher-resolution dataset during the self-supervised pre-training stage results in learning more discriminative and general representations.
- Published
- 2024
- Full Text
- View/download PDF
20. Rough Set, ELM Classifier and Deep Architecture for Remote Sensing Images
- Author
-
Sharma, Neeta, Sindal, Ravi, Meher, Saroj K., Tsihrintzis, George A., Series Editor, Virvou, Maria, Series Editor, Jain, Lakhmi C., Series Editor, Dehuri, Satchidananda, editor, Cho, Sung-Bae, editor, Padhy, Venkat Prasad, editor, Shanmugam, Poonkuntrun, editor, and Ghosh, Ashish, editor
- Published
- 2024
- Full Text
- View/download PDF
21. PReLim: A Modeling Paradigm for Remote Sensing Image Scene Classification Under Limited Labeled Samples
- Author
-
Dutta, Suparna, Das, Monidipa, Hartmanis, Juris, Founding Editor, Goos, Gerhard, Series Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ghosh, Ashish, editor, King, Irwin, editor, Bhattacharyya, Malay, editor, Sankar Ray, Shubhra, editor, and K. Pal, Sankar, editor
- Published
- 2024
- Full Text
- View/download PDF
22. Multi-patch Adversarial Attack for Remote Sensing Image Classification
- Author
-
Wang, Ziyue, Huang, Jun-Jie, Liu, Tianrui, Chen, Zihan, Zhao, Wentao, Liu, Xiao, Pan, Yi, Liu, Lin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Song, Xiangyu, editor, Feng, Ruyi, editor, Chen, Yunliang, editor, Li, Jianxin, editor, and Min, Geyong, editor
- Published
- 2024
- Full Text
- View/download PDF
23. Leveled Approach of Context Setting in Semantic Understanding of Remote Sensing Images
- Author
-
Ahuja, Stuti, Patil, Sonali, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Shaw, Rabindra Nath, editor, Siano, Pierluigi, editor, Makhilef, Saad, editor, Ghosh, Ankush, editor, and Shimi, S. L., editor
- Published
- 2024
- Full Text
- View/download PDF
24. Unveiling the Influence of Image Super-Resolution on Aerial Scene Classification
- Author
-
Ramzy Ibrahim, Mohamed, Benavente, Robert, Ponsa, Daniel, Lumbreras, Felipe, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Vasconcelos, Verónica, editor, Domingues, Inês, editor, and Paredes, Simão, editor
- Published
- 2024
- Full Text
- View/download PDF
25. Remote Sensing Image Classification Based on Canny Operator Enhanced Edge Features.
- Author
-
Zhou, Mo, Zhou, Yue, Yang, Dawei, and Song, Kai
- Subjects
- *
IMAGE recognition (Computer vision) , *REMOTE sensing , *EDGE detection (Image processing) , *FEATURE extraction , *DATA mining , *NAIVE Bayes classification - Abstract
Remote sensing image classification plays a crucial role in the field of remote sensing interpretation. With the exponential growth of multi-source remote sensing data, accurately extracting target features and comprehending target attributes from complex images significantly impacts classification accuracy. To address these challenges, we propose a Canny edge-enhanced multi-level attention feature fusion network (CAF) for remote sensing image classification. The original image is specifically inputted into a convolutional network for the extraction of global features, while increasing the depth of the convolutional layer facilitates feature extraction at various levels. Additionally, to emphasize detailed target features, we employ the Canny operator for edge information extraction and utilize a convolution layer to capture deep edge features. Finally, by leveraging the Attentional Feature Fusion (AFF) network, we fuse global and detailed features to obtain more discriminative representations for scene classification tasks. The performance of our proposed method (CAF) is evaluated through experiments conducted across three openly accessible datasets for classifying scenes in remote sensing images: NWPU-RESISC45, UCM, and MSTAR. The experimental findings indicate that our approach based on incorporating edge detail information outperforms methods relying solely on global feature-based classifications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Federated Learning Approach for Remote Sensing Scene Classification.
- Author
-
Ben Youssef, Belgacem, Alhmidi, Lamyaa, Bazi, Yakoub, and Zuair, Mansour
- Subjects
- *
FEDERATED learning , *REMOTE sensing , *TRANSFORMER models , *DISTANCE education , *CLASSIFICATION - Abstract
In classical machine learning algorithms, used in many analysis tasks, the data are centralized for training. That is, both the model and the data are housed within one device. Federated learning (FL), on the other hand, is a machine learning technique that breaks away from this traditional paradigm by allowing multiple devices to collaboratively train a model without each sharing their own data. In a typical FL setting, each device has a local dataset and trains a local model on that dataset. The local models are next aggregated at a central server to produce a global model. The global model is then distributed back to the devices, which update their local models accordingly. This process is repeated until the global model converges. In this article, a FL approach is applied for remote sensing scene classification for the first time. The adopted approach uses three different RS datasets while employing two types of CNN models and two types of Vision Transformer models, namely: EfficientNet-B1, EfficientNet-B3, ViT-Tiny, and ViT-Base. We compare the performance of FL in each model in terms of overall accuracy and undertake additional experiments to assess their robustness when faced with scenarios of dropped clients. Our classification results on test data show that the two considered Transformer models outperform the two models from the CNN family. Furthermore, employing FL with ViT-Base yields the highest accuracy levels even when the number of dropped clients is significant, indicating its high robustness. These promising results point to the notion that FL can be successfully used with ViT models in the classification of RS scenes, whereas CNN models may suffer from overfitting problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Cross-Domain Classification Based on Frequency Component Adaptation for Remote Sensing Images.
- Author
-
Zhu, Peng, Zhang, Xiangrong, Han, Xiao, Cheng, Xina, Gu, Jing, Chen, Puhua, and Jiao, Licheng
- Subjects
- *
CLASSIFICATION , *FEATURE extraction , *KNOWLEDGE transfer - Abstract
Cross-domain scene classification requires the transfer of knowledge from labeled source domains to unlabeled target domain data to improve its classification performance. This task can reduce the labeling cost of remote sensing images and improve the generalization ability of models. However, the huge distributional gap between labeled source domains and unlabeled target domains acquired by different scenes and different sensors is a core challenge. Existing cross-domain scene classification methods focus on designing better distributional alignment constraints, but are under-explored for fine-grained features. We propose a cross-domain scene classification method called the Frequency Component Adaptation Network (FCAN), which considers low-frequency features and high-frequency features separately for more comprehensive adaptation. Specifically, the features are refined and aligned separately through a high-frequency feature enhancement module (HFE) and a low-frequency feature extraction module (LFE). We conducted extensive transfer experiments on 12 cross-scene tasks between the AID, CLRS, MLRSN, and RSSCN7 datasets, as well as two cross-sensor tasks between the NWPU-RESISC45 and NaSC-TG2 datasets, and the results show that the FCAN can effectively improve the model's performance for scene classification on unlabeled target domains compared to other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Focal-Hinge Loss based Deep Hybrid Framework for imbalanced Remote Sensing Scene Classification.
- Author
-
KUMARI, NEHA and MINZ, SONAJHARIA
- Subjects
MACHINE learning ,REMOTE sensing ,CONVOLUTIONAL neural networks ,FEATURE extraction ,DEEP learning ,URBAN planning - Abstract
Deep learning models have received a significant breakthrough in remote sensing scene classification due to their discriminative, hierarchical feature extraction ability. Nevertheless, CNN-based methods deliver accurate classification results only with sufficient annotated training samples. The computational bottleneck of CNNs with numerous parameters in case of inadequate training samples and the inherent class imbalance problem in high-resolution satellite scene classification question the classifier's performance. Existing deep CNNs with conventional Cross-Entropy loss function neglected the significance of gradient contribution from minority classes in handling imbalanced LULC class distribution. In this context, we propose the hybrid probabilistic gradient-based deep learning framework CNN-FHSVM with regularized novel Focal-Hinge loss cost function optimization for alleviating misclassifications in imbalanced datasets. The empirical experimentation with Sentinel 2 EuroSat Dataset benchmarked for deep learning algorithms demonstrated that the proposed model is superior in mitigating classification errors in imbalanced class distribution contrasted to the cutting-edge deep learning frameworks. The proposed loss function adaptively updates the gradient of the minority classes, drifting the focus to misclassified scenes. Focal-hinge loss is the first endeavor adapted to remote sensing LULC multiclass classification to reduce misclassifications. The model demonstrates higher accuracy with reduced misclassifications and training time and can benefit other remote sensing applications like early deforestation urban planning, where LULC maps are imbalanced. [ABSTRACT FROM AUTHOR]
- Published
- 2024
29. INCEPTION SH: A NEW CNN MODEL BASED ON INCEPTION MODULE FOR CLASSIFYING SCENE IMAGES.
- Author
-
METLEK, Sedat and ÇETİNER, Halit
- Subjects
STRUCTURAL optimization ,DRONE aircraft ,AUTONOMOUS vehicles - Abstract
Copyright of SDU Journal of Engineering Sciences & Design / Mühendislik Bilimleri ve Tasarım Dergisi is the property of Journal of Engineering Sciences & Design and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
30. Transformer based ensemble deep learning approach for remote sensing natural scene classification.
- Author
-
Sivasubramanian, Arrun, VR, Prashanth, V, Sowmya, and Ravi, Vinayakumar
- Subjects
- *
DEEP learning , *REMOTE sensing , *FEDERATED learning , *TRANSFORMER models , *IMAGE recognition (Computer vision) , *DISTANCE education - Abstract
Very high resolution (VHR) remote sensing (RS) image classification is paramount for detailed Earth's surface analysis. Feature extraction from VHR natural scenes is crucial, but it becomes a challenging task because of the overlapping edges present in images. Multiple open-source datasets exist in the literature to train robust models, and they have been benchmarked using deep learning models. However, different datasets contain different numbers of classes; a few of them could be absent in other datasets because they are independent of the other classes, and the classes are sometimes not mutually exclusive amongst the same dataset. Thus, it is very challenging to generalize a model trained on a single dataset to perform scene classification on unknown classes of multiple benchmark datasets and real-time images. Thus, this work introduces the Remote Sensing Natural Scenes 92 (RS_NS92) dataset, consisting of 36,785 images belonging to 92 classes, curated by selectively taking the union of all subclasses from five benchmark datasets. This class count is significantly higher than publicly available datasets and maintains a low-class imbalance and a comprehensive data distribution for robust model training. It also provides the remote sensing community with an extra platform to validate the performance on multiple benchmarks. Inspired by federated learning, an ensemble approach consisting of three feature extraction backbones: Vision Transformers, Swin Transformers, and ConvNeXt (termed the VSC_Ensemble model) is also introduced. This model can make extraordinary predictions across multiple datasets by finetuning weights using transfer learning. Experimental analysis with the proposed approach not only obtains a high test accuracy of 97.24% and an F1-score of 0.9587 for the 92 classes on a 90:10 split of the proposed benchmark dataset but also gets excellent results on unseen test images of other datasets, which are comparable to the state-of-the-art results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Towards exploiting believe function theory for object based scene classification problem.
- Author
-
Amirat, Anfel, Benrais, Lamine, and Baha, Nadia
- Abstract
Scene classification is one of the active research domains of artificial intelligence (AI) with many real-world applications. This paper presents a new scene classification approach based on the Belief Function Theory, which provides a more effective way of handling uncertainty information compared to traditional probability-based methods. Unlike previous methods that rely on probabilities, which have proved their limitations, the main contribution of our approach is the use of belief degrees to classify unknown scenes based on object labels. We conduct experiments on three well-known datasets (SUN397, MIT Indoor, and LabelMe) and compare our results with state-of-the-art methods. Our approach achieves competitive results with a simple and robust framework that outperforms previous methods in some cases. We also provide insights into the strengths and limitations of our approach and discuss potential future directions for research. Overall, our work demonstrates the effectiveness of the Belief Function theory in scene classification and opens up new avenues for further research and innovation in this area. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Location-aware multi-code generator for remote sensing scene classification.
- Author
-
Bian, Xiaoyong, Yang, Zufang, Hu, Chengsong, Peng, Min, and Tang, Jinshan
- Subjects
- *
CONVOLUTIONAL neural networks , *CODE generators , *CLASSIFICATION , *REMOTE sensing , *OBJECT recognition (Computer vision) , *WIRELESS mesh networks - Abstract
Many types of objects are crowdedly distributed on the surface of remote sensing scenes. Global semantics with object mixture and small training samples potentially cause difficulties in the classification of remote sensing scenes. The description of them often requires crucial parts of the scenes and corresponding discriminative features, especially as the convolutional neural network (CNN) goes deeper. In this paper, we propose a novel Location-Aware Multi-code Generator network (LAM-GAN) that incorporates multiple latent codes as the input to a generator network. This network is designed to train from scratch and recover most details of the input (real) image in a principled way. Meanwhile, multiple latent codes are reversely updated using K cluster centres located by the subsequently proposed part co-location module. By doing so, the global features of the real-fake image pair and part-level features are stacked and fed to a joint part classification network for discriminative classification. This approach makes it easier to induce the semantic concepts in a remote sensing scene. With this formulation, our approach generates an internal compact representation of the scene and enables weakly supervised part co-localization. The proposed method provides a unified framework for not only generating high-quality fake images but also facilitating the remote sensing scene classification task. We evaluated LAM-GAN on several benchmark datasets, and the experiment results demonstrate that the proposed method is more effective than previous state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Advancements in remote sensing: Harnessing the power of artificial intelligence for scene image classification.
- Author
-
Khadidos, Alaa O.
- Subjects
IMAGE recognition (Computer vision) ,ARTIFICIAL neural networks ,REMOTE sensing ,ARTIFICIAL intelligence ,CONVOLUTIONAL neural networks ,NATURAL ventilation - Abstract
The Remote Sensing Scene Image Classification (RSSIC) procedure is involved in the categorization of the Remote Sensing Images (RSI) into sets of semantic classes depending upon the content and this procedure plays a vital role in extensive range of applications, like environment monitoring, urban planning, vegetation mapping, natural hazards' detection and geospatial object detection. The RSSIC procedure exploits Artificial Intelligence (AI) technology, mostly Machine Learning (ML) techniques, for automatic analysis and categorization of the content, present in these images. The purpose is to recognize and differentiate the land cover classes or features in the scene, namely crops, forests, buildings, water bodies, roads, and other natural and man-made structures. RSSIC, using Deep Learning (DL) techniques, has attracted a considerable attention and accomplished important breakthroughs, thanks to the great feature learning abilities of the Deep Neural Networks (DNNs). In this aspect, the current study presents the White Shark Optimizer with DL-driven RSSIC (WSODL-RSSIC) technique. The presented WSODL-RSSIC technique mainly focuses on detection and classification of the remote sensing images under various class labels. In the WSODL-RSSIC technique, the deep Convolutional Neural Network (CNN)-based ShuffleNet model is used to produce the feature vectors. Moreover, the Deep Multilayer Neural network (DMN) classifiers are utilized for recognition and classification of the remote sensing images. Furthermore, the WSO technique is used to optimally adjust the hyperparameters of the DMN classifier. The presented WSODL-RSSIC method was simulated for validation using the remote-sensing image databases. The experimental outcomes infer that the WSODL-RSSIC model achieved improved results in comparison with the current approaches under different evaluation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Improving remote sensing scene classification using dung Beetle optimization with enhanced deep learning approach
- Author
-
Mohammad Alamgeer, Alanoud Al Mazroa, Saud S. Alotaibi, Meshari H. Alanazi, Mohammed Alonazi, and Ahmed S. Salama
- Subjects
Remote sensing images ,Scene classification ,Deep learning ,Dung beetle optimization ,Transfer learning ,Science (General) ,Q1-390 ,Social sciences (General) ,H1-99 - Abstract
Remote sensing (RS) scene classification has received significant consideration because of its extensive use by the RS community. Scene classification in satellite images has widespread uses in remote surveillance, environmental observation, remote scene analysis, urban planning, and earth observations. Because of the immense benefits of the land scene classification task, various approaches have been presented recently for automatically classifying land scenes from remote sensing images (RSIs). Several approaches dependent upon convolutional neural networks (CNNs) are presented for classifying brutal RS scenes; however, they could only partially capture the context from RSIs due to the problematic texture, cluttered context, tiny size of objects, and considerable differences in object scale. This article designs a Remote Sensing Scene Classification using Dung Beetle Optimization with Enhanced Deep Learning (RSSC-DBOEDL) approach. The purpose of the RSSC-DBOEDL technique is to categorize different varieties of scenes that exist in the RSI. In the presented RSSC-DBOEDL technique, the enhanced MobileNet model is primarily deployed as a feature extractor. The DBO method could be implemented in this study for hyperparameter tuning of the enhanced MobileNet model. The RSSC-DBOEDL technique uses a multi-head attention-based long short-term memory (MHA-LSTM) technique to classify the scenes in the RSI. The simulation evaluation of the RSSC-DBOEDL approach has been examined under the benchmark RSI datasets. The simulation results of the RSSC-DBOEDL approach exhibited a more excellent accuracy outcome of 98.75 % and 95.07 % under UC Merced and EuroSAT datasets with other existing methods regarding distinct measures.
- Published
- 2024
- Full Text
- View/download PDF
35. Universal adversarial defense in remote sensing based on pre-trained denoising diffusion models
- Author
-
Weikang Yu, Yonghao Xu, and Pedram Ghamisi
- Subjects
Adversarial defense ,Adversarial examples ,Diffusion models ,Remote sensing ,Scene classification ,Semantic segmentation ,Physical geography ,GB3-5030 ,Environmental sciences ,GE1-350 - Abstract
Deep neural networks (DNNs) have risen to prominence as key solutions in numerous AI applications for earth observation (AI4EO). However, their susceptibility to adversarial examples poses a critical challenge, compromising the reliability of AI4EO algorithms. This paper presents a novel Universal Adversarial Defense approach in Remote Sensing Imagery (UAD-RS), leveraging pre-trained diffusion models to protect DNNs against various adversarial examples exhibiting heterogeneous adversarial patterns. Specifically, a universal adversarial purification framework is developed utilizing pre-trained diffusion models to mitigate adversarial perturbations through the introduction of Gaussian noise and subsequent purification of the perturbations from adversarial examples. Additionally, an Adaptive Noise Level Selection (ANLS) mechanism is introduced to determine the optimal noise level for the purification framework with a task-guided Fréchet Inception Distance (FID) ranking strategy, thereby enhancing purification performance. Consequently, only a single pre-trained diffusion model is required for purifying various adversarial examples with heterogeneous adversarial patterns across each dataset, significantly reducing training efforts for multiple attack settings while maintaining high performance without prior knowledge of adversarial perturbations. Experimental results on four heterogeneous RS datasets, focusing on scene classification and semantic segmentation, demonstrate that UAD-RS outperforms state-of-the-art adversarial purification approaches, providing universal defense against seven commonly encountered adversarial perturbations. Codes and the pre-trained models are available online (https://github.com/EricYu97/UAD-RS).
- Published
- 2024
- Full Text
- View/download PDF
36. Advancements in remote sensing: Harnessing the power of artificial intelligence for scene image classification
- Author
-
Alaa O. Khadidos
- Subjects
remote sensing images ,artificial intelligence ,white shark optimizer ,scene classification ,deep learning ,Mathematics ,QA1-939 - Abstract
The Remote Sensing Scene Image Classification (RSSIC) procedure is involved in the categorization of the Remote Sensing Images (RSI) into sets of semantic classes depending upon the content and this procedure plays a vital role in extensive range of applications, like environment monitoring, urban planning, vegetation mapping, natural hazards' detection and geospatial object detection. The RSSIC procedure exploits Artificial Intelligence (AI) technology, mostly Machine Learning (ML) techniques, for automatic analysis and categorization of the content, present in these images. The purpose is to recognize and differentiate the land cover classes or features in the scene, namely crops, forests, buildings, water bodies, roads, and other natural and man-made structures. RSSIC, using Deep Learning (DL) techniques, has attracted a considerable attention and accomplished important breakthroughs, thanks to the great feature learning abilities of the Deep Neural Networks (DNNs). In this aspect, the current study presents the White Shark Optimizer with DL-driven RSSIC (WSODL-RSSIC) technique. The presented WSODL-RSSIC technique mainly focuses on detection and classification of the remote sensing images under various class labels. In the WSODL-RSSIC technique, the deep Convolutional Neural Network (CNN)-based ShuffleNet model is used to produce the feature vectors. Moreover, the Deep Multilayer Neural network (DMN) classifiers are utilized for recognition and classification of the remote sensing images. Furthermore, the WSO technique is used to optimally adjust the hyperparameters of the DMN classifier. The presented WSODL-RSSIC method was simulated for validation using the remote-sensing image databases. The experimental outcomes infer that the WSODL-RSSIC model achieved improved results in comparison with the current approaches under different evaluation metrics.
- Published
- 2024
- Full Text
- View/download PDF
37. PPLM-Net: Partial Patch Local Masking Net for Remote Sensing Image Unsupervised Domain Adaptation Classification
- Author
-
Junsong Leng, Zhong Chen, Haodong Mu, Tianhang Liu, Hanruo Chen, and Guoyou Wang
- Subjects
Domain adversarial training (DAT) ,PPLM-net ,remote sensing image ,scene classification ,unsupervised domain adaptation (UDA) ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
In remote sensing image classification task, it is often apply a model trained on one dataset (source domain) to another dataset (target domain). However, due to the presence of domain shift between these domains where data are not independent and identically distributed, the performance of the model typically deteriorates. Domain adaptation aims to improve the generalization performance of the model in the target domain. In response to the challenges of intricate backgrounds, domain shift, and potentially unlabeled target domain in remote sensing images, this article proposes a network specifically designed for unsupervised domain adaptation (UDA) classification of remote sensing images, named PPLM-net. The network consists of a domain adversarial training (DAT) module, a partial patch local masking (PPLM) module and a teacher–student network module. The DAT module enables the network to extract domain-invariant features. The PPLM module compels the model to focus on the global information of target domain remote sensing images with intricate backgrounds, learning contextual content to improve model performance. The teacher network generates pseudolabels for complete unlabeled target domain images. The student network trained with PPLM target domain classification loss to generate robust and discriminative features. We construct a dataset dedicated to the UDA scene classification task of remote sensing images named RSDA. We collect images from four publicly available datasets spanning seven common categories, containing over 10 000 images. Compared with the current state-of-the-art UDA model, PPLM-net achieves the best results in 12 domain adaptation classification tasks on RSDA. The average accuracy reaches 99.115%.
- Published
- 2024
- Full Text
- View/download PDF
38. A Deeper Look Into Remote Sensing Scene Image Misclassification by CNNs
- Author
-
Anas Tukur Balarabe and Ivan Jordanov
- Subjects
Image similarity metrics ,Euclidean distance ,local binary pattern ,transfer learning ,scene classification ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
As deeper and lighter variations of convolutional neural networks (CNNs) continue to break accuracy and efficiency records, their applications for solving domain-specific challenges continue to widen, particularly in computer vision and pattern recognition. The feat achieved by these end-to-end learning models can be attributed to their ability to extract local and global discriminative features for effective classification. However, in land use and land cover classification (LULC), inner-class variability and outer-class similarity could cause a classifier to confuse one image’s discriminative features with another’s, leading to inefficiency and poor classification. In this work, we deviate from the conventional approach of classifying high-resolution remote sensing images (HRRS) by proposing a framework for comparing and combining images of different simple classes into superclasses based on spatial, textural, and colour similarities. To achieve this, we implement the Bhattacharyya metric for colour-based similarity analysis, a combination of LBPs (Local Binary Pattern), the Earth Mover’s Distance, and Euclidean Distance for the texture and spatial similarity analysis in addition to the structural similarity index (SSIM). A pre-trained CNN model (Xception) is then fine-tuned to classify the superclasses and the original classes of the Aerial Image (AID), the UC Merced, the Optical Image Analysis and Learning (OPTIMAL-31), and NWPU-RESIS45 datasets. Results show that methodically combining overlapping classes into superclasses reduces the possibility of misclassifications and increases the efficiency of CNNs. The model evaluation further indicates that this approach can boost classifiers’ robustness and significantly reduce the impact of inner-class variability and outer-class similarity on their performance.
- Published
- 2024
- Full Text
- View/download PDF
39. STMNet: Scene Classification-Assisted and Texture Feature-Enhanced Multiscale Network for Large-Scale Urban Informal Settlement Extraction From Remote Sensing Images
- Author
-
Shouhang Du, Jianghe Xing, Shaoyu Wang, Liguang Wei, and Yirui Zhang
- Subjects
Handcrafted texture feature (HTF) ,high-resolution remote sensing image (HRI) ,scene classification ,semantic segmentation ,urban informal settlement (UIS) ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Automatic urban informal settlement (UIS) extraction based on high-resolution remote sensing image (HRI) is of great significance for urban planning and management. This study proposes a scene classification-assisted and texture feature-enhanced multiscale network (STMNet) for UIS extraction. First, STMNet takes HRI and computed handcrafted texture feature (HTF) as input. Second, it employs a pseudo-siamese network to extract multidimensional deep features from HRI and HTF, respectively. In addition, a feature attention fusion module is constructed to fuse the aforementioned features. Finally, skip connection and feature decoder are utilized to obtain UIS extraction results. In detail, considering the sparse and dispersed distribution of UIS, a scene information aggregation and classification module is constructed to determine whether the input image patch contains UIS. For the characteristics of high spatial heterogeneity and various shapes and scales of UIS, an improved atrous spatial pyramid pooling is presented to extract multiscale and multireceptive field features. An edge loss function is applied during network training to minimize errors in the edge regions of UIS. The effectiveness of STMNet is tested on a self-produced UIS extraction dataset and the publicly available UIS-Shenzhen dataset. Quantitative results demonstrate that STMNet achieved the best performance in terms of $\text{fa}$, $F1$, and $\text{IoU}$. The $\text{mr}$ is slightly higher than that of MAResU-Net and UisNet. In addition, STMNet achieved the best visual interpretation results and the fastest inference speed on the self-produced UIS extraction dataset.
- Published
- 2024
- Full Text
- View/download PDF
40. MTP: Advancing Remote Sensing Foundation Model via Multitask Pretraining
- Author
-
Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, and Liangpei Zhang
- Subjects
Change detection ,foundation model ,multitask pretraining (MTP) ,object detection ,remote sensing (RS) ,scene classification ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Foundation models have reshaped the landscape of remote sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the multitask pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multitask supervised pretraining on the segment anything model annotated remote sensing segmentation dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal, and rotated object detection, semantic segmentation, and change detection. Extensive experiments across 14 datasets demonstrate the superiority of our models over existing ones of similar size and their competitive performance compared to larger state-of-the-art models, thus validating the effectiveness of MTP.
- Published
- 2024
- Full Text
- View/download PDF
41. Scene Recognition With Objectness, Attribute, and Category Learning
- Author
-
Li-Hui Zhao, Jean-Paul Ainam, Ji Zhang, and Wenai Song
- Subjects
Scene classification ,object detection ,attribute recognition ,attribute annotation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Scene classification has established itself as a challenging research problem. Compared to images of individual objects, scene images could be much more semantically complex and abstract. Their difference mainly lies in the level of granularity of recognition. Yet, image recognition serves as a key pillar for the good performance of scene recognition as the knowledge attained from object images can be used for accurate recognition of scenes. The existing scene recognition methods only take the category label of the scene into consideration. However, we find that the contextual information that contains detailed local descriptions are also beneficial in allowing the scene recognition model to be more discriminative. In this paper, we aim to improve scene recognition using attribute and category label information encoded in objects. Based on the complementarity of attribute and category labels, we propose a Multi-task Attribute-Scene Recognition (MASR) network which learns a category embedding and at the same time predicts scene attributes. Attribute acquisition and object annotation are tedious and time consuming tasks. We tackle the problem by proposing a partially supervised annotation strategy in which human intervention is significantly reduced. The strategy provides a much more cost-effective solution to real world scenarios, and requires considerably less annotation efforts. Moreover, we re-weight the attribute predictions considering the level of importance indicated by the object detected scores. Using the proposed method, we efficiently annotate attribute labels for four large-scale datasets, and systematically investigate how scene and attribute recognition benefit from each other. The experimental results demonstrate that MASR learns a more discriminative representation and achieves competitive recognition performance compared to the state-of-the-art methods.
- Published
- 2024
- Full Text
- View/download PDF
42. Generating Adversarial Examples Against Remote Sensing Scene Classification via Feature Approximation
- Author
-
Rui Zhu, Shiping Ma, Jiawei Lian, Linyuan He, and Shaohui Mei
- Subjects
Adversarial examples ,feature approximation (FA) ,remote sensing ,scene classification ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
The existence of adversarial examples highlights the vulnerability of deep neural networks, which can change the recognition results by adding well-designed perturbations to the original image. It brings a great challenge to the remote sensing images (RSI) scene classification. RSI scene classification primarily relies on the spatial and texture feature information of images, making attacks in the feature domain more effective. In this study, we introduce the feature approximation (FA) strategy, which generates adversarial examples by approximating clean image features to virtual images that are designed to not belong to any category. Our research aims to attack image classification models that are trained with RSI and discover the common vulnerabilities of these models. Specifically, we benchmark the FA attack using both featureless images and images generated via data augmentation methods. We then extend the FA attack to multimodel FA (MFA), improving the transferability of the attack. Finally, we show that the FA strategy is also effective for targeted attacks by approximating the input clean image features to the target category image features. Extensive experiments on the remote sensing classification datasets UC Merced and AID demonstrate the effectiveness of the methods in this article. The FA attack exhibits remarkable attack performance. Furthermore, the proposed MFA attack outperforms the success rate achieved by existing advanced targetless black-box attacks by an average of over 15%. The FA attack also performs better compared to multiple existing targeted white-box attacks.
- Published
- 2024
- Full Text
- View/download PDF
43. Class-Incremental Novel Category Discovery in Remote Sensing Image Scene Classification via Contrastive Learning
- Author
-
Yifan Zhou, Haoran Zhu, Chang Xu, Ruixiang Zhang, Guang Hua, and Wen Yang
- Subjects
Contrastive learning ,incremental learning ,novel category discovery (NCD) ,remote sensing (RS) ,scene classification ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Remote sensing (RS) imagery captures the earth's ever-changing landscapes, reflecting evolving land cover patterns propelled by natural processes and human activities. However, existing RS scene classification methods mainly operate under a closed-set hypothesis, which stumbles when encountering novel emerging scenes. This article addresses the intricate task of RS scene classification without labels for novel scenes under incremental learning, termed class-incremental novel category discovery. We propose a contrastive learning-based novel category discovery pipeline tailored for RS image scene classification, enhancing the ability to learn unlabeled novel class data. Furthermore, within this pipeline, we introduce a positive pair filter to identify more positive sample pairs from novel classes, improving the feature representation capability on unlabeled data. Besides, our contrastive learning pipeline incorporates an old-feature replaying method to alleviate catastrophic forgetting in old classes. Extensive evaluations across three public RS datasets showcase the superiority of our method over state-of-the-art approaches.
- Published
- 2024
- Full Text
- View/download PDF
44. Large Kernel Separable Mixed ConvNet for Remote Sensing Scene Classification
- Author
-
Keqian Zhang, Tengfei Cui, Wei Wu, Xueke Zheng, and Gang Cheng
- Subjects
Channel separation and mixing ,large kernel convolution ,remote sensing ,scene classification ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Among tasks related to intelligent interpretation of remote sensing data, scene classification mainly focuses on the holistic information of the entire scene. Compared with pixel-level or object-based tasks, it involves a richer semantic context, making it more challenging. With the rapid advancement of deep learning, convolutional neural networks (CNNs) have found widespread applications across various domains, and some work has introduced them into scene classification tasks. However, traditional convolution operations involve sliding small convolutional kernels across an image, primarily focusing on local details within a small receptive field. To achieve better modeling of the entire image, the smaller receptive field limits the ability of convolution operation to capture features over a broader range. To this end, we introduce large kernel CNNs into the scene classification task to expand the receptive field of the mode, which allows us to capture comprehensive nonlocal information while still acquiring rich local details. However, in addition to encoding spatial association, the effective information within the feature maps is also strongly channel related. Therefore, to fully model this channel dependency, a novel channel separation and mixing module has been designed to realize feature correlation in the channel dimension. The combination of them forms a large kernel separable mixed ConvNet, enabling the model to capture effective dependencies of feature maps in both spatial and channel dimensions, thus achieving enhanced feature expression. Extensive experiments conducted on three datasets have also validated the effectiveness of the proposed method.
- Published
- 2024
- Full Text
- View/download PDF
45. Progressive Feature Fusion Framework Based on Graph Convolutional Network for Remote Sensing Scene Classification
- Author
-
Chongyang Zhang and Bin Wang
- Subjects
Feature fusion ,graph convolutional network (GCN) ,graph learning ,remote sensing (RS) ,scene classification ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Remote sensing (RS) scene classification plays an important role in the intelligent interpretation of RS data. Recently, convolutional neural network (CNN)-based and attention-based methods have become the mainstream of RS scene classification with impressive results. However, existing CNN-based methods do not utilize long-range information, and existing attention-based methods do not fully exploit multiscale information, although both aspects of information are essential for a comprehensive understanding of RS scene images. To overcome the above limitations, we propose a progressive feature fusion (PFF) framework based on graph convolutional network (GCN), namely PFFGCN for RS scene classification in this article, which has a strong ability to learn both multiscale and contextual (local/long-range) information in RS scene images. It mainly consists of two modules: a multilayer feature extraction module and a multiscale contextual information fusion (MCIF) module. The MFE module is utilized to extract multilevel features and global features, and the MCIF module is constructed to capture rich contextual information from multilevel features and fuse them in a progressive manner. In MCIF, GCN is adopted to explore intrinsic attributes (including the topological structure and the contextual information) hidden in each feature map. Through the PFF strategy, the graph features at each level are fused with the next-level features to reduce the semantic gap between nonadjacent features and enhance the multiscale representation of the model. Besides, grouped GCN based on channel grouping is further proposed to improve the efficiency of PFFGCN. The proposed method is extensively evaluated on various RS scene classification datasets, and the experimental results demonstrate that the proposed method outperforms current state-of-the-art methods.
- Published
- 2024
- Full Text
- View/download PDF
46. Multi-Label Scene Classification on Remote Sensing Imagery Using Modified Dingo Optimizer With Deep Learning
- Author
-
Mahmoud Ragab
- Subjects
Remote sensing images ,deep learning ,scene classification ,hyperparameter tuning ,computer vision ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Multi-label scene classification on remote sensing imagery (RSI) includes the classification of images into multiple categories or labels, where each image belongs to more than one class or scene. This is a common task in RS and computer vision, especially for applications like urban planning, land cover classification, and environmental monitoring. By leveraging the power of deep learning (DL), this model extracts high-level features from the imagery, facilitating efficient and accurate scene classification, which is indispensable for applications including environmental analysis, land use monitoring, and disaster management. This study introduces a new Multi-Label Scene Classification on Remote Sensing Imagery using Modified Dingo Optimizer with Deep Learning (MSCRSI-MDODL) technique. The MSCRSI-MDODL technique targeted the identification and classification of multiple target classes from the RSI. In the presented MSCRSI-MDODL technique, attention Squeeze and Excitation (SE) with DenseNet model, named improved DenseNet model is applied for the extraction of features. Besides, MDO algorithm can be employed for the optimal hyperparameter tuning of the improved Densenet model. For scene classification process, the MSCRSI-MDODL technique makes use of stacked dilated convolutional autoencoders (SDCAE) model. The simulation analysis of the MSCRSI-MDODL model is tested on benchmark RSI datasets. The comprehensive result analysis portrayed the higher performance of the MSCRSI-MDODL technique over other existing techniques for RSI classification.
- Published
- 2024
- Full Text
- View/download PDF
47. Class-Aware Self-Distillation for Remote Sensing Image Scene Classification
- Author
-
Bin Wu, Siyuan Hao, and Wei Wang
- Subjects
Deep learning ,knowledge distillation (KD) ,remote sensing image ,scene classification ,vision transformer (ViT) ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Currently, convolutional neural networks (CNNs) and vision transformers (ViTs) are widely adopted as the predominant neural network architectures for remote sensing image scene classification. Although CNNs have lower computational complexity, ViTs have a higher performance ceiling, making both suitable as backbone networks for remote sensing scene classification tasks. However, remote sensing imagery has high intraclass variation and interclass similarity, which poses a challenge for existing methods. To address this issue, we propose the class-aware self-distillation (CASD) framework. This framework uses an end-to-end distillation mechanism to mine class-aware knowledge, effectively reducing the impact of significant intraclass variation and interclass similarity in remote sensing imagery. Specifically, our approach involves constructing pairs of images: similar pairs consisting of images belonging to the same class, and dissimilar pairs consisting of images from different classes. We then apply a distillation loss that we designed, which distills the corresponding probability distributions to ensure that the distributions of similar pairs become more consistent, and those of dissimilar pairs become more distinct. In addition, the enforced learnable $\alpha$ added to the distillation loss further amplifies the network's ability to comprehend class-aware knowledge. The experiment section demonstrates that our method CASD outperforms other methods on four publicly available datasets. And the ablation experiments demonstrate the effectiveness of the method.
- Published
- 2024
- Full Text
- View/download PDF
48. Multi-Scale and Multi-Network Deep Feature Fusion for Discriminative Scene Classification of High-Resolution Remote Sensing Images
- Author
-
Baohua Yuan, Sukhjit Singh Sehra, and Bernard Chiu
- Subjects
convolutional neural network (CNN) ,feature fusion ,discriminative canonical correlation analysis (DCCA) ,discriminant correlation analysis (DCA) ,scene classification ,Science - Abstract
The advancement in satellite image sensors has enabled the acquisition of high-resolution remote sensing (HRRS) images. However, interpreting these images accurately and obtaining the computational power needed to do so is challenging due to the complexity involved. This manuscript proposed a multi-stream convolutional neural network (CNN) fusion framework that involves multi-scale and multi-CNN integration for HRRS image recognition. The pre-trained CNNs were used to learn and extract semantic features from multi-scale HRRS images. Feature extraction using pre-trained CNNs is more efficient than training a CNN from scratch or fine-tuning a CNN. Discriminative canonical correlation analysis (DCCA) was used to fuse deep features extracted across CNNs and image scales. DCCA reduced the dimension of the features extracted from CNNs while providing a discriminative representation by maximizing the within-class correlation and minimizing the between-class correlation. The proposed model has been evaluated on NWPU-RESISC45 and UC Merced datasets. The accuracy associated with DCCA was 10% and 6% higher than discriminant correlation analysis (DCA) in the NWPU-RESISC45 and UC Merced datasets. The advantage of DCCA was better demonstrated in the NWPU-RESISC45 dataset due to the incorporation of richer within-class variability in this dataset. While both DCA and DCCA minimize between-class correlation, only DCCA maximizes the within-class correlation and, therefore, attains better accuracy. The proposed framework achieved higher accuracy than all state-of-the-art frameworks involving unsupervised learning and pre-trained CNNs and 2–3% higher than the majority of fine-tuned CNNs. The proposed framework offers computational time advantages, requiring only 13 s for training in NWPU-RESISC45, compared to a day for fine-tuning the existing CNNs. Thus, the proposed framework achieves a favourable balance between efficiency and accuracy in HRRS image recognition.
- Published
- 2024
- Full Text
- View/download PDF
49. Instance-Level Scaling and Dynamic Margin-Alignment Knowledge Distillation for Remote Sensing Image Scene Classification
- Author
-
Chuan Li, Xiao Teng, Yan Ding, and Long Lan
- Subjects
knowledge distillation ,scaling distillation ,scene classification ,model compression ,deep learning ,Science - Abstract
Remote sensing image (RSI) scene classification aims to identify semantic categories in RSI using neural networks. However, high-performance deep neural networks typically demand substantial storage and computational resources, making practical deployment challenging. Knowledge distillation has emerged as an effective technique for developing compact models that maintain high classification accuracy in RSI tasks. Existing knowledge distillation methods often overlook the high inter-class similarity in RSI scenes, leading to low-confidence soft labels from the teacher model, which can mislead the student model. Conversely, overly confident soft labels may discard valuable non-target information. Additionally, the significant intra-class variability in RSI contributes to instability in the model’s decision boundaries. To address these challenges, we propose an efficient method called instance-level scaling and dynamic margin-alignment knowledge distillation (ISDM) for RSI scene classification. To balance the target and non-target class influence, we apply an entropy regularization loss to scale the teacher model’s target class at the instance level. Moreover, we introduce dynamic margin alignment between the student and teacher models to improve the student’s discriminative capability. By optimizing soft labels and enhancing the student’s ability to distinguish between classes, our method reduces the effects of inter-class similarity and intra-class variability. Experimental results on three public RSI scene classification datasets (AID, UCMerced, and NWPU-RESISC) demonstrate that our method achieves state-of-the-art performance across all teacher–student pairs with lower computational costs. Additionally, we validate the generalization of our approach on general datasets, including CIFAR-100 and ImageNet-1k.
- Published
- 2024
- Full Text
- View/download PDF
50. A New Scene Sensing Model Based on Multi-Source Data from Smartphones
- Author
-
Zhenke Ding, Zhongliang Deng, Enwen Hu, Bingxun Liu, Zhichao Zhang, and Mingyang Ma
- Subjects
multi-source sensor ,scene classification ,GNSS ,data mining ,CNN ,Chemical technology ,TP1-1185 - Abstract
Smartphones with integrated sensors play an important role in people’s lives, and in advanced multi-sensor fusion navigation systems, the use of individual sensor information is crucial. Because of the different environments, the weights of the sensors will be different, which will also affect the method and results of multi-source fusion positioning. Based on the multi-source data from smartphone sensors, this study explores five types of information—Global Navigation Satellite System (GNSS), Inertial Measurement Units (IMUs), cellular networks, optical sensors, and Wi-Fi sensors—characterizing the temporal, spatial, and mathematical statistical features of the data, and it constructs a multi-scale, multi-window, and context-connected scene sensing model to accurately detect the environmental scene in indoor, semi-indoor, outdoor, and semi-outdoor spaces, thus providing a good basis for multi-sensor positioning in a multi-sensor navigation system. Detecting environmental scenes provides an environmental positioning basis for multi-sensor fusion localization. This model is divided into four main parts: multi-sensor-based data mining, a multi-scale convolutional neural network (CNN), a bidirectional long short-term memory (BiLSTM) network combined with contextual information, and a meta-heuristic optimization algorithm.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.