9,150 results on '"feature fusion"'
Search Results
2. EBFF-YOLO: enhanced bimodal feature fusion network for UAV image object detection.
- Author
-
Xue, Ping and Zhang, Zhen
- Abstract
Existing methods for fusing visible light and infrared features often focus on separating objects from backgrounds. In unmanned aerial vehicle (UAV) images, background areas typically contain multiple types of small objects, and the dispersed nature of these objects reduces the effectiveness of feature fusion, leading to phenomena such as missed detections and false alarms when dealing with aerial small objects. This paper introduces a bimodal feature fusion network named EBFF-Net, designed to address the issue of UAV image object detection in low-visibility environments. In this study, a shallow learning network is utilized to extract complementary features, and the Parallel Shallow Feature Fusion (PSFF) method is designed to extract and fuse bimodal features. Additionally, a reconfigurable structure with diverse branch blocks is introduced to the bottleneck layer to better capture feature information without increasing the computational burden during inference. Furthermore, leveraging the geometric properties of two-dimensional rectangles and based on an adaptive weighting algorithm, a novel localization loss function is developed. Subjective and objective testing on the VEDAI, DIOR, and VAID datasets validate the efficacy and lightweight. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Retrieving images with missing regions by fusion of content and semantic features.
- Author
-
Taheri, Fatemeh, Rahbar, Kambiz, and Beheshtifard, Ziaeddin
- Subjects
ARTIFICIAL neural networks ,GENERATIVE adversarial networks ,FEATURE extraction ,CONTENT-based image retrieval ,IMAGE retrieval - Abstract
Deep neural networks with a significant ability to learn and extract image discriminative features make a significant contribution to image retrieval systems. Poor performance in retrieving query images with missing regions is the weak point of image retrieval systems. In this paper, a generative adversarial network is proposed with the aim of inpainting the incomplete images with missing regions in the image retrieval system. Query image inpainting is performed simultaneously at both general and partial levels with two generative networks. Inpainted areas include the semantic and visual features of the input query image. The inpainted image can then be used in the image retrieving system. In the image retrieval process, the content features of the image are extracted from handcrafted features and the VGG-16 deep neural network, including color, texture, and semantic features. The attribute vector of each image is obtained by fusion of the attributes of both parts. Finally, similar images are retrieved based on the smallest Euclidean distance. The explainability of important features of the image in the form of effective super pixels of the image has also been interpreted before and after the use of the LIME technique. The performance of the image retrieval model is confirmed on the ImageNet dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. VT-BPAN: vision transformer-based bilinear pooling and attention network fusion of RGB and skeleton features for human action recognition.
- Author
-
Sun, Yaohui, Xu, Weiyao, Yu, Xiaoyi, and Gao, Ju
- Subjects
HUMAN activity recognition ,TRANSFORMER models ,KINECT (Motion sensor) ,HUMAN skeleton ,SKELETON - Abstract
Recent generation Microsoft Kinect Camera captures a series of multimodal signals that provide RGB video, depth sequences, and skeleton information, thus it becomes an option to achieve enhanced human action recognition performance by fusing different data modalities. However, most existing fusion methods simply fuse different features, which ignores the underlying semantics between different models, leading to a lack of accuracy. In addition, there exists a large amount of background noise. In this work, we propose a Vision Transformer-based Bilinear Pooling and Attention Network (VT-BPAN) fusion mechanism for human action recognition. This work improves the recognition accuracy in the following ways: 1) An effective two-stream feature pooling and fusion mechanism is proposed. The RGB frames and skeleton are fused to enhance the spatio-temporal feature representation. 2) A spatial lightweight multiscale vision Transformer is proposed, which can reduce the cost of computing. The framework is evaluated based on three widely used video action datasets, and the proposed approach performs a more comparable performance with the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Skin cancer classification leveraging multi-directional compact convolutional neural network ensembles and gabor wavelets.
- Author
-
Attallah, Omneya
- Subjects
- *
CONVOLUTIONAL neural networks , *SKIN cancer , *TUMOR classification , *FEATURE selection , *DEEP learning , *SUPERIOR colliculus - Abstract
Skin cancer (SC) is an important medical condition that necessitates prompt identification to ensure timely treatment. Although visual evaluation by dermatologists is considered the most reliable method, its efficacy is subjective and laborious. Deep learning-based computer-aided diagnostic (CAD) platforms have become valuable tools for supporting dermatologists. Nevertheless, current CAD tools frequently depend on Convolutional Neural Networks (CNNs) with huge amounts of deep layers and hyperparameters, single CNN model methodologies, large feature space, and exclusively utilise spatial image information, which restricts their effectiveness. This study presents SCaLiNG, an innovative CAD tool specifically developed to address and surpass these constraints. SCaLiNG leverages a collection of three compact CNNs and Gabor Wavelets (GW) to acquire a comprehensive feature vector consisting of spatial–textural–frequency attributes. SCaLiNG gathers a wide range of image details by breaking down these photos into multiple directional sub-bands using GW, and then learning several CNNs using those sub-bands and the original picture. SCaLiNG also combines attributes taken from various CNNs trained with the actual images and subbands derived from GW. This fusion process correspondingly improves diagnostic accuracy due to the thorough representation of attributes. Furthermore, SCaLiNG applies a feature selection approach which further enhances the model's performance by choosing the most distinguishing features. Experimental findings indicate that SCaLiNG maintains a classification accuracy of 0.9170 in categorising SC subcategories, surpassing conventional single-CNN models. The outstanding performance of SCaLiNG underlines its ability to aid dermatologists in swiftly and precisely recognising and classifying SC, thereby enhancing patient outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. A Study of Assisted Screening for Alzheimer's Disease Based on Handwriting and Gait Analysis.
- Author
-
Qi, Hengnian, Zhu, Xiaorong, Ren, Yinxia, Zhang, Xiaoya, Tang, Qizhe, Zhang, Chu, Lang, Qing, and Wang, Lina
- Subjects
- *
MACHINE learning , *GRAPHOLOGY , *ALZHEIMER'S disease , *NEURODEGENERATION , *DISEASE progression - Abstract
Background: Alzheimer's disease (AD) is a progressive neurodegenerative disease that is not easily detected in the early stage. Handwriting and walking have been shown to be potential indicators of cognitive decline and are often affected by AD. Objective: This study proposes an assisted screening framework for AD based on multimodal analysis of handwriting and gait and explores whether using a combination of multiple modalities can improve the accuracy of single modality classification. Methods: We recruited 90 participants (38 AD patients and 52 healthy controls). The handwriting data was collected under four handwriting tasks using dot-matrix digital pens, and the gait data was collected using an electronic trail. The two kinds of features were fused as inputs for several different machine learning models (Logistic Regression, SVM, XGBoost, Adaboost, LightGBM), and the model performance was compared. Results: The accuracy of each model ranged from 71.95% to 96.17%. Among them, the model constructed by LightGBM had the best performance, with an accuracy of 96.17%, sensitivity of 95.32%, specificity of 96.78%, PPV of 95.94%, NPV of 96.74%, and AUC of 0.991. However, the highest accuracy of a single modality was 93.53%, which was achieved by XGBoost in gait features. Conclusions: The research results show that the combination of handwriting features and gait features can achieve better classification results than a single modality. In addition, the assisted screening model proposed in this study can achieve effective classification of AD, which has development and application prospects. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. CrackYOLO: towards efficient dam crack detection for underwater scenes.
- Author
-
Shi, Pengfei, Shao, Shen, Fan, Xinnan, Xin, Yuanxue, Zhou, Zhongkai, Cao, Pengfei, Li, Xinyu, and Zhu, Sisi
- Abstract
Crack is one of the main factors threatening the safety of the dam. Automatic image object detection is the main way of underwater dam crack detection. However, the traditional methods have problems with low crack detection speed, high false alarm rate, and poor robustness. In addition, the existing methods cannot get a satsifying detection result with a high detection speed. To solve these problems, we propose an efficient dam crack detection method for underwater scenes, called CrackYOLO. Firstly, to better integrate the multi-scale features without incurring excessive computational costs, we propose a feature fusion module in CrackYOLO. Next, we re-design the skip-connection in the network to get better features, compressing the overall model parameters. Then, we propose a feature extraction module called Res2C3, which combines semantic and location information. After that, we proposed a BCAtt to make features focus on both channel and location information. Finally, according to the characteristics of dam underwater crack images, we use a genetic algorithm to select the best value of hyperparameters of the model. The experimental results show that the proposed method detects underwater dam cracks robustly with less computational cost. Our CrackYOLO can get 94.3% mAP[0.5] and 151 FPS in underwater crack detection task which can achieve a real-time detection in practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Small object detection based on YOLOv8 in UAV perspective.
- Author
-
Ning, Tao, Wu, Wantong, and Zhang, Jin
- Abstract
Unmanned aerial vehicle (UAV) image object detection is a challenging task, primarily due to various factors such as multi-scale objects, a high proportion of small objects, significant overlap between objects, poor image quality, and complex and dynamic scenes. To address these challenges, several improvements were made to the YOLOv8 model. Firstly, by pruning the feature mapping layers responsible for detecting large objects in the YOLOv8 model, significant reduction in computational resources was achieved, rendering the model more lightweight. Simultaneously, a detection head fused with self-attention was introduced simultaneously to enhance the detection capability for small objects. Secondly, the introduction of space depth convolution in place of the original convolutional striding and pooling operations facilitates more effective preservation of details in low-resolution images and small objects. Lastly, a multi-level feature fusion module was designed to merge feature maps from different network layers, enhancing the network's representation capability. Results on the Visdrone dataset demonstrate that the proposed model achieved a significant 4.7% improvement in mAP50 compared to YOLOv8, while reducing the parameter count to only 39% of the original model. Moreover, transfer experiments on the TT100k dataset showed a 3.2% increase in mAP50, validating the effectiveness of the improved model for small object detection tasks in UAV images. Our code is made available at . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. DFEF: Diversify feature enhancement and fusion for online knowledge distillation.
- Author
-
Liang, Xingzhu, Zhang, Jian, Liu, Erhu, and Fang, Xianjin
- Subjects
- *
TRAINING of student teachers , *TRADITIONAL knowledge , *INFORMATION networks , *TEACHERS - Abstract
Traditional knowledge distillation relies on high‐capacity teacher models to supervise the training of compact student networks. To avoid the computational resource costs associated with pretraining high‐capacity teacher models, teacher‐free online knowledge distillation methods have achieved satisfactory performance. Among these methods, feature fusion methods have effectively alleviated the limitations of training without the strong guidance of a powerful teacher model. However, existing feature fusion methods often focus primarily on end‐layer features, overlooking the efficient utilization of holistic knowledge loops and high‐level information within the network. In this article, we propose a new feature fusion‐based mutual learning method called Diversify Feature Enhancement and Fusion for Online Knowledge Distillation (DFEF). First, we enhance advanced semantic information by mapping multiple end‐of‐network features to obtain richer feature representations. Next, we design a self‐distillation module to strengthen knowledge interactions between the deep and shallow network layers. Additionally, we employ attention mechanisms to provide deeper and more diversified enhancements to the input feature maps of the self‐distillation module, allowing the entire network architecture to acquire a broader range of knowledge. Finally, we employ feature fusion to merge the enhanced features and generate a high‐performance virtual teacher to guide the training of the student model. Extensive evaluations on the CIFAR‐10, CIFAR‐100, and CINIC‐10 datasets demonstrate that our proposed method can significantly enhance performance compared to state‐of‐the‐art feature fusion‐based online knowledge distillation methods. Our code can be found at https://github.com/JSJ515-Group/DFEF-Liu. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Crack detection method for concrete surface based on feature fusion.
- Author
-
Hong, Cheng
- Subjects
- *
CRACKING of concrete , *IMAGE recognition (Computer vision) , *FEATURE extraction , *DEEP learning , *GENERALIZATION - Abstract
In recent years, detection methods based on deep learning have received widespread attention in the field of concrete crack detection. In view of the shortcomings of traditional image detection methods, a concrete crack detection method based on feature fusion is proposed. The Fourier frequency domain processed image is used as the input of the deep learning neural network. The original time domain image and the frequency domain image are respectively input into two feature extraction modules to extract high-level features, and then the two features are fused to fully characterize the characteristics of the time domain and frequency domain, and finally the concrete crack detection results of the feature fusion are obtained. The performance of the proposed method is compared with VGG-16, AlexNet and DenseNet. Experiments show that the accuracy of the proposed method is higher than VGG-16, AlexNet and DenseNet. The proposed method has good results in concrete crack detection. To verify the generalization ability of the proposed model, the Concrete Crack Images for Classification data set was input into the proposed model for testing. The experimental results show that the proposed model has good generalization ability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. MFFAE-Net: semantic segmentation of point clouds using multi-scale feature fusion and attention enhancement networks.
- Author
-
Liu, Wei, Lu, Yisheng, and Zhang, Tao
- Abstract
Point cloud data can reflect more information about the real 3D space, which has gained increasing attention in computer vision field. But the unstructured and unordered nature of point clouds poses many challenges in their study. How to learn the global features of the point cloud in the original point cloud is a problem that has been accompanied by the research. In the research based on the structure of the encoder and decoder, many researchers focus on designing the encoder to better extract features, and do not further explore more globally representative features according to the features of the encoder and decoder. To solve this problem, we propose the MFFAE-Net method, which aims to obtain more globally representative point cloud features by using the feature learning of encoder decoder stage.Our method first enhances the feature information of the input point cloud by merging the information of its neighboring points, which is helpful for the following point cloud feature extraction work. Secondly, the channel attention module is used to further process the extracted features, so as to highlight the role of important channels in the features. Finally, we fuse features of different scales from encoding features and decoding features as well as features of the same scale, so as to obtain more global point cloud features, which will help improve the segmentation results of point clouds. Experimental results show that the method performs well on some objects in S3DIS dataset and Toronto3d dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Human pose estimation based on cross-view feature fusion.
- Author
-
Sun, Dandan, Wang, Siqi, Xia, Hailun, Zhang, Changan, Gao, Jianlong, and Mao, Mingyu
- Subjects
- *
FEATURE extraction , *HUMAN beings , *DEEP learning - Abstract
Multi-view human pose estimation can achieve high accuracy by leveraging complex spatial information from multiple perspectives. However, increasing the number of views can strain the network model, potentially compromising estimation accuracy under limited computing resources. Furthermore, in the current approach of using ResNet for feature extraction, traditional methods involve deconvolution to obtain large-sized feature maps, which can introduce artificial interference. To tackle the above challenges, we propose a perceptual network based on flexible combination view feature fusion. The network is comprised of three crucial modules. The flexible view combination policy module enables high accuracy from just a single reference view. It avoids the problem of increased complexity caused by a large number of views. The up-sampling module, based on sub-pixel convolution, is designed to achieve efficient high-resolution recovery. This resolves the issue of artificial interference introduced by deconvolution. Additionally, the feature fusion module maximizes the utilization of reference view cues to enhance the human pose estimation in the current view. Experiments conducted on the Human3.6m dataset demonstrate a reduction in the average MPJPE to 18.3 mm using our model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Scene representation using a new two-branch neural network model.
- Author
-
Parseh, Mohammad Javad, Rahmanimanesh, Mohammad, Keshavarzi, Parviz, and Azimifar, Zohreh
- Subjects
- *
ARTIFICIAL neural networks , *IMAGE representation , *DEEP learning , *COMPUTER vision , *RECOGNITION (Psychology) , *FEATURE extraction , *CONVOLUTIONAL neural networks - Abstract
Scene classification and recognition have always been one of the most challenging tasks of scene understanding due to the inherent ambiguity in visual scenes. The core of scene classification and recognition tasks is scene representation. Deep learning advances in computer vision, especially deep CNNs, have significantly improved scene representation in the last decade. Deep convolutional features extracted from deep CNNs provide discriminative representations of the images and are widely used in various computer vision tasks, such as scene classification. Deep convolutional features capture the appearance characteristics of the image and the spatial information about different image regions. Meanwhile, the semantic and context information obtained from high-level concepts about scene images, such as objects and their relationships, can significantly contribute to identifying scene images. Therefore, in this paper, we divide visual scenes into two categories, object-based and layout-based. Object-based scenes are scenes that have scene-specific objects and, based on those objects, can be described and identified. In contrast, the layout-based scenes do not have scene-specific objects and are described and identified based on the appearance and layout of the image. This paper proposes a new neural network model for representing and classifying visual scenes, which we call G-CNN (GNN-CNN). The proposed model includes two modules, feature extraction and feature fusion, and the feature extraction module composes of visual and semantic branches. The visual branch is responsible for extracting deep CNN features from the image, and the semantic branch is responsible for extracting semantic GNN features from the scene graph corresponding to the image. The feature fusion module is a novel two-stream neural network that fuses the CNN and GNN feature vectors to produce a comprehensive representation of the scene image. Finally, a fully-connected classifier classified the obtained comprehensive feature vector into one of the pre-defined categories. The proposed model has been evaluated on three benchmark scene datasets, UIUC Sports, MIT67, and SUN397, and obtained classification accuracy of 99.91%, 96.01%, and 85.32%, respectively. In addition, a new dataset named Scene40, which has been introduced in our previous paper, is also used for further evaluation of the proposed method. The comparison results based on classification accuracy criteria show that the proposed model can outperform the best previous methods on three benchmark scene datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Gated weighted normative feature fusion for multispectral object detection.
- Author
-
Wu, Xianjun, Jiang, Xian, and Dong, Ligang
- Subjects
- *
OBJECT recognition (Computer vision) , *FEATURE extraction , *SPINE , *MULTISPECTRAL imaging - Abstract
Multispectral image pairs can provide independent and complementary information to more comprehensively describe detection targets, thereby improving the robustness and reliability of object detectors. The performance of an object detector depends on how cross-modality features are extracted and fused. To exploit the different modalities fully, we propose a lightweight yet effective cross-modality feature fusion approach named gated weighted normative feature fusion. In the feature extraction stage, our proposed dual-input backbone network can extract richer and more useful features. In the feature fusion stage, the fusion module can eliminate redundant features, dynamically weigh the importance of two image features, and further normalize fused features. Experiments and ablation studies on several publicly available datasets demonstrate the effectiveness of our method. Our proposed method achieved better performance in terms of mAP50 with 80.3%, mAP with 41.8%, and mAP50 with 98.0%, mAP with 68.0% on the FLIR and LLVIP datasets, respectively. In particular, the inference speed of our method is twice as fast as the current state-of-the-art method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Robust feature aggregation network for lightweight and effective remote sensing image change detection.
- Author
-
You, Zhi-Hui, Chen, Si-Bao, Wang, Jia-Xin, and Luo, Bin
- Subjects
- *
REMOTE sensing , *DEEP learning , *SOURCE code , *PROBLEM solving , *SPINE - Abstract
In remote sensing (RS) image change detection (CD) task, many existing CD methods focus more on how to improve accuracy, but they usually have more parameters, higher computational costs, and heavier memory usage. Designing lightweight and performance-sustainable CD model that is more compatible with real-world applications is an urgent problem to be solved. Therefore, we propose a lightweight change detection network, called as robust feature aggregation network (RFANet). To improve representative capability of weaker features extracted from lightweight backbone, a feature reinforcement module (FRM) is proposed. FRM allows current level feature to densely interact and fuse with other level features, thus accomplishing the complementarity of fine-grained details and semantic information. Considering massive objects with rich correlations in RS images, we design semantic split-aggregation module (SSAM) to better capture global semantic information of changed objects. Besides, we present a lightweight decoder containing channel interaction module (CIM), which allows multi-level refined difference features to emphasize changed areas and suppress background and pseudo-changes. Extensive experiments carried out on four challenging RS image CD datasets illustrate that RFANet achieves competitive performance with fewer parameters and lower computational costs. The source code is available at https://github.com/Youzhihui/RFANet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors.
- Author
-
Jin, Yan-Ting, Tan, Yang, Gan, Zhong-Hua, Hao, Yu-Duo, Wang, Tian-Yu, Lin, Hao, and Tang, Bo
- Subjects
- *
FEATURE extraction , *CLASSIFICATION algorithms , *GENETIC transcription regulation , *HUMAN genome , *RANDOM forest algorithms - Abstract
• Identification of DHSs can help to understanding the mechanism of disease development and the treatment of the disease. • Multi-dimensional feature fusion strategy was used for feature extraction from DNase samples. • An overall prediction accuracy of 0.859 was achieved with an AUC value of 0.837. DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing cis -regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The F -score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. A novel sea-land segmentation network for enhanced coastline extraction using satellite remote sensing images.
- Author
-
Feng, Jiangfan, Wang, Shiyu, and Gu, Zhujun
- Subjects
- *
COASTS , *EMERGENCY management , *COASTAL development , *LAND cover , *SUSTAINABLE development , *IMAGE segmentation , *REMOTE sensing - Abstract
• Improving Coastline Edge Detail: EDS module enhances edge fitting. • Bridging Semantic Gap: CSDS and AFM collaboration enhances semantic information. • Outperforms with mIoU: CSAFNet achieves remarkable 96.72% mIoU value. The extraction of coastlines from remote sensing images is vital for promoting sustainable development in coastal areas, conserving marine environments, strengthening disaster response capabilities, and supporting scientific research. However, current coastline detection approaches using remote sensing face challenges related to resolution, terrain, boundary, and data, requiring accurate solutions for reliability. Here, we introduce the Collaborative Supervision and Attention Fusion (CSAFNet) model for pixel-level sea-land segmentation, with a primary goal of improving the accuracy of coastline extraction. The model integrates the Edge Deep Supervision (EDS) module to enhance coastline edge detail fitting. Additionally, the Collaborative Semantic Deep Supervision (CSDS) module and Attention Fusion Module (AFM) collaborate to bridge the semantic gap between different hierarchical features, resulting in a more precise and detailed delineation of coastlines. Experimental validation on the publicly available SLSD 1 1 The dataset was initially labeled as "sea-land segmentation data" by the authors. For ease of discussion, we will adopt the practice of using the initial letters of each word as an abbreviation. dataset has demonstrated superiority over various advanced methods, with an impressive mIoU value of 96.72%. Through simple optimization, detailed and rich coastlines can be extracted, validating the feasibility of coastline extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Small Object Detection in Aerial Drone Imagery based on YOLOv8.
- Author
-
Junyu Pan and Yujun Zhang
- Subjects
COMPUTATIONAL complexity ,NECK - Abstract
In recent years, the utilization of unmanned aerial vehicles (UAVs) for aerial target detection has gained significant attention due to their high-altitude perspective and maneuverability, which offer novel opportunities and tremendous potential in this field. However, detecting targets in UAV aerial images remains highly challenging due to the presence of numerous small targets with limited feature information, as well as issues like target occlusion and complex backgrounds that severely impact detection accuracy. To address these challenges, we propose a detection model called BDC-YOLOv8 that aims to enhance accuracy for small targets while minimizing computational complexity. Specifically, we augment the YOLOv8 architecture by incorporating a dedicated detection head tailored for small targets to improve performance when encountering such objects. Additionally, we restructure the neck network of the model to better extract and fuse feature information from targets with significant scale variations. Furthermore, we introduce the concept of DynamicHead to enhance the detection head by incorporating various attention mechanisms suitable for our task ahead of the original detection head, thereby enhancing the model's capability to detect objects of different scales and complex backgrounds. Moreover, we introduce Convolutional Block Attention Module (CBAM) to identify regions of interest in densely populated areas. Extensive experiments conducted on the VisDrone2019 dataset yield promising results where our model achieves a mean Average Precision (mAP) score of 38% and an AP50 score of 59.6%. Compared to the original YOLOV8 model, improvements are observed with increases in mAP by 2.5% and AP50 by 3.7%, respectively. Notably, our model demonstrates a significant enhancement in detecting small targets with an increase in APs evaluation metric by 4.1%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
19. Slow feature-based feature fusion methodology for machinery similarity-based prognostics.
- Author
-
Xue, Bin, Xu, Haoyan, Huang, Xing, and Xu, Zhongbin
- Subjects
SIGNAL-to-noise ratio ,PLANT maintenance ,ROLLER bearings ,TREND analysis ,MACHINERY - Abstract
Similarity-based prediction methods utilize degradation trend analysis based on degradation indicators (DIs). These methods are gaining prominence in industrial predictive maintenance because they effectively address prognostics for machines with unknown failure mechanisms. However, current studies often neglect the discrepancies in degradation trends when constructing DIs from multi-sensor data and lack automatic normalization of operating regimes during feature fusion. In this study, a feature fusion methodology based on a signal-to-noise ratio metric that leverages slow feature analysis (SFA) is proposed. This customized metric utilizes SFA to quantify degradation trend discrepancies of constructed DIs, while automatically filtering out the effects of multiple operating regimes during feature fusion. The effectiveness and superiority of the proposed method are demonstrated using publicly available aero-engine and rolling bearing datasets. • An improved version of the signal-to-noise ratio for feature fusion of similarity-based prognostics. • Discrepancy information of prognostics variables is characterized through differential analysis. • Slow feature analysis is utilized to normalize multiple operating regimes automatically. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. A robust feature matching algorithm based on adaptive feature fusion combined with image superresolution reconstruction.
- Author
-
Huangfu, Wenjun, Ni, Cui, Wang, Peng, and Zhang, Yingying
- Subjects
CONVOLUTIONAL neural networks ,FEATURE extraction ,IMAGE reconstruction ,DEEP learning ,HIGH resolution imaging ,IMAGE reconstruction algorithms ,IMAGE registration - Abstract
With the development of image feature matching technology, feature matching algorithms based on deep learning have achieved excellent results, but in scenarios with low texture or extreme perspective changes, the matching accuracy is still difficult to guarantee. In this paper, a superresolution reconstruction method based on a Residual-ESPCN (efficient subpixel convolutional neural network) approach is proposed based on LoFTR (local feature matching with transformers). The superresolution method is used to improve the interpolation method used in ASFF (adaptive spatial feature fusion) to increase the image resolution, enhance the detailed information of the image, and make the extracted features richer. Then, ASFF is introduced into the local feature extraction module of LoFTR, which can alleviate the inconsistency problem of information transmission between different scale features of the feature pyramid and lessen the amount of information lost during transmission from low- to high-resolution levels. Moreover, to improve the adaptability of the algorithm to different scenarios, OTSU is introduced to adaptively calculate the threshold of feature matching. The experimental results show that in different indoor or outdoor scenarios, our proposed algorithm for matching features can effectively improve the adaptability of feature matching and can achieve good results in terms of the area under the curve (AUC), accuracy and recall. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Advancements in Remote Sensing Image Dehazing: Introducing URA-Net with Multi-Scale Dense Feature Fusion Clusters and Gated Jump Connection.
- Author
-
Liu, Hongchi, Deng, Xing, and Shao, Haijian
- Subjects
CONVOLUTIONAL neural networks ,REMOTE sensing ,DEEP learning ,IMAGE fusion ,ATMOSPHERIC models - Abstract
The degradation of optical remote sensing images due to atmospheric haze poses a significant obstacle, profoundly impeding their effective utilization across various domains. Dehazing methodologies have emerged as pivotal components of image preprocessing, fostering an improvement in the quality of remote sensing imagery. This enhancement renders remote sensing data more indispensable, thereby enhancing the accuracy of target identification. Conventional defogging techniques based on simplistic atmospheric degradation models have proven inadequate for mitigating non-uniform haze within remotely sensed images. In response to this challenge, a novel UNet Residual Attention Network (URA-Net) is proposed. This paradigmatic approach materializes as an end-to-end convolutional neural network distinguished by its utilization of multi-scale dense feature fusion clusters and gated jump connections. The essence of our methodology lies in local feature fusion within dense residual clusters, enabling the extraction of pertinent features from both preceding and current local data, depending on contextual demands. The intelligently orchestrated gated structures facilitate the propagation of these features to the decoder, resulting in superior outcomes in haze removal. Empirical validation through a plethora of experiments substantiates the efficacy of URA-Net, demonstrating its superior performance compared to existing methods when applied to established datasets for remote sensing image defogging. On the RICE-1 dataset, URA-Net achieves a Peak Signal-to-Noise Ratio (PSNR) of 29.07 dB, surpassing the Dark Channel Prior (DCP) by 11.17 dB, the All-in-One Network for Dehazing (AOD) by 7.82 dB, the Optimal Transmission Map and Adaptive Atmospheric Light For Dehazing (OTM-AAL) by 5.37 dB, the Unsupervised Single Image Dehazing (USID) by 8.0 dB, and the Superpixel-based Remote Sensing Image Dehazing (SRD) by 8.5 dB. Particularly noteworthy, on the SateHaze1k dataset, URA-Net attains preeminence in overall performance, yielding defogged images characterized by consistent visual quality. This underscores the contribution of the research to the advancement of remote sensing technology, providing a robust and efficient solution for alleviating the adverse effects of haze on image quality. Graphic Abstract [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Research on badminton take-off recognition method based on improved deep learning.
- Author
-
Lianju, Lu and Haiying, Zhang
- Abstract
Because of the fast take-off speed of badminton, a single action recognition method can't quickly and accurately identify the action. Therefore, a new badminton take-off recognition method based on improved deep learning is proposed to capture badminton take-off accurately. Collect badminton sports videos and get images of athletes' activity areas by tracking the moving targets in badminton competition videos. The static characteristics of badminton players' take-off actions are extracted from the athletes' activity areas' images using 3D ConvNets. According to the human joint points in the badminton player's target tracking image, the human skeleton sequence is constructed by using a 2D coordinate pseudo-image and 2D skeleton data design algorithm, and the dynamic characteristics of badminton take-off action are extracted from the human skeleton sequence by using LSTM (Long-term and Short-term Memory Network). After the static and dynamic features are fused by weighted summation, badminton take-off feature fusion results are input into a convolutional neural network (CNN) to complete badminton take-off recognition. The CNN pool layer is improved by adaptive pooling, and the network convergence is accelerated by combining batch normalization to further optimize the recognition results of badminton take-off. Experiments show that the human skeleton model can accurately match human movements and assist in extracting action features. The improved CNN has greatly improved the accuracy of recognition of take-off actions. When recognizing real images, it can accurately identify human movements and judge whether there is a take-off action. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Harmonizing local and global features: enhanced hand gesture segmentation using synergistic fusion of CNN and transformer networks.
- Author
-
Wang, Shi, Yang, Ning, Liu, Maohua, Tian, Qing, and Zhang, Shihui
- Abstract
Hand gesture segmentation is an important research topic in computer vision. Despite ongoing efforts, achieving optimal gesture segmentation remains challenging, attributed to factors like gesture morphology and intricate backgrounds. In light of these challenges, we propose a novel hand gesture segmentation approach that strategically combines the strengths of Convolutional Neural Networks (CNN) for local feature extraction and Transformer Networks for global feature integration. To be more specific, we design two feature fusion modules. One employs an attention mechanism to learn how to fuse features extracted by CNN and Transformer. The second module utilizes a combination of group convolution and activation functions to implement gating mechanisms, enhancing the response of crucial features while minimizing interference from weaker ones. Our proposed method achieves mIoU score of 93.53%, 97.25%, and 90.39% on OUHANDS, HGR1, and EgoHands hand gesture datasets respectively, which outperforms the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. HGANet-23: a novel architecture for human gait analysis based on deep neural network and improved satin bowerbird optimization.
- Author
-
Jahangir, Faiza, Khan, Muhammad Attique, Damaševičius, Robertas, Alblehai, Fahad, Alzahrani, Ahmed Ibrahim, Shabaz, Mohammad, Keshta, Ismail, and Pandey, Yogadhar
- Abstract
Human gait is an essential biometric feature in the area of computer vision research. Over the past ten years, there has been a growing demand for a non-contact biometric approach to identify potential candidates, mainly since the global COVID-19 epidemic emerged. Gait recognition involves automatically capturing and extracting characteristics of human movement, which are subsequently utilized to verify the identity of a moving individual. Nevertheless, covariates like walking while carrying a bag, changing clothes, environmental conditions, and any unusual gait patterns all have an impact on the accuracy of gait recognition accuracy. This paper presents a new end-to-end deep learning framework for human gait recognition. The proposed framework contains a few important steps that help in the improvement of the recognition accuracy. A contrast enhancement technique named Enhancing Human Body Shape and Reducing Noise is proposed at the initial step and used for the dataset augmentation. The second step involves deep learning architecture development, such as the proposed GNET-23 model and a fine-tuned pre-trained AlexNet model. Both models are trained on selected datasets and later extract deep features from the average pooling layer. A novel parallel correlation fusion technique is proposed to fuse the richer information of both models that are further optimized using an improved Satin Bowerbird optimization algorithm. Finally, the most optimal features are classified using Neural Networks and nearest-neighbor classifiers. The experiment was conducted using four different angles of publicly accessible CASIA-B datasets, resulting in mean accuracy scores of 91.6%, 96.2%, 94.3%, and 96.8%, respectively. The proposed framework surpasses other deep learning networks and recently published techniques in both accuracy and processing speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. IENet: inheritance enhancement network for video salient object detection.
- Author
-
Jiang, Tao, Wang, Yi, Hou, Feng, and Wang, Ruili
- Subjects
FEATURE extraction ,TRANSFORMER models ,VIDEOS - Abstract
Effective utilization of spatiotemporal information is essential for improving the accuracy and robustness of Video Salient Object Detection (V-SOD). However, current methods have not fully utilized historical frame information, ultimately resulting in insufficient integration of complementary semantic information. To address this issue, we propose a novel Inheritance Enhancement Network (IENet) based on Transformer. The core of IENet is a Heritable Multi-Frame Attention (HMA) module, which fully exploits long-term context and frame-aware temporal modeling in feature extraction through unidirectional cross-frame enhancement. In contrast to existing methods, our heritable strategy is based on the unidirectional inheritance model using attention maps which ensure the information propagation for each frame is consistent and orderly, avoiding additional interference. Furthermore, we propose an auxiliary attention loss by using inherited attention maps to direct the network to focus more on target regions. The experimental results of our IENet reveal its effectiveness in handling challenging scenes on five popular benchmark datasets. For instance, in the cases of VOS and DAVSOD, our method achieves 0.042% and 0.070% for MAE compared to other competitive models. Particularly, IENet excels in inheriting finer details from historical frames even in complex environments. The module and predicted maps are publicly available at https://github.com/TOMMYWHY/IENet [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Remaining useful life prediction based on spatiotemporal autoencoder.
- Author
-
Xu, Tao, Pi, Dechang, and Zeng, Shi
- Subjects
REMAINING useful life ,REPRESENTATIONS of graphs ,TIME series analysis ,PREDICTION models ,PROBLEM solving ,DEEP learning - Abstract
Remaining Useful Life (RUL) prediction has received a lot of attention as the core of prognostics and health management (PHM) technology. Deep learning-based RUL prediction methods are currently the most popular, and in order to solve the problem that most of the current deep RUL prediction studies do not consider the structural information between sensors, we propose a spatiotemporal autoencoder (STAE)-based RUL prediction method. The method extracts the time domain information from the data through the temporal convolutional network. It obtains the structural information of the sensors by converting the time series data into a graph structure by utilizing the maximal information coefficient and then performing the graph representation learning. For the two obtained features, a feature fusion method based on the graph attention mechanism is used for fusion and finally, the new fused features are utilized for RUL prediction. To validate the effectiveness of STAE, we conducted experiments on the simulated dataset C-MAPSS and the real satellite dataset SCS-PSS, and our proposed method outperforms the baseline method on both datasets. The results suggest that considering structural information between sensors in the deep RUL prediction model can improve prediction accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. GHAFNet: Global-context hierarchical attention fusion method for traffic object detection.
- Author
-
Cui-jin, Li, Zhong, Qu, and Sheng-ye, Wang
- Subjects
TRAFFIC monitoring ,PYRAMIDS - Abstract
Small-object detection has become a hot issue in complex traffic scenes. A global context multilevel fusion attention detection method for small-object detection is proposed in this paper. First, a global context feature fusion network model is designed with cross-stage partial DarkNet (CSPDarkNet) as the backbone to capture the global context semantic information and refine the local information. To further refine the local information, a hierarchical hybrid attention module is designed that uses global average pooling to obtain the H direction weight matrix, fuse it with the W direction weight matrix, fuse the results with the channel direction weight matrix, and finally obtain a feature map with multidimensional weights. Second, to increase the receptive field of the multidimensional weighted feature map, atrous convolution is added in the spatial pyramid pooling (SPP) module. To improve the small-object detection accuracy, a 160 × 160 small-object detection head is added. Finally, the Focal-EIOU (efficient intersection over union) loss function is adopted to better converge in the training process. Full experiments have been carried out in the open traffic datasets Cityscapes and KITTI, and this paper proposed model is the best-performing method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. A MENTAL ILLNESS DETECTION MODEL FOR COLLEGE STUDENTS BASED ON BODY BEHAVIOR AND FACIAL EXPRESSION FEATURES.
- Author
-
LI, KEKE, YAO, JIAN, LEUNG, CHUN KAI, and CHEN, AIGUO
- Subjects
- *
MENTAL depression , *FACIAL expression , *MENTAL illness , *MENTAL health , *PHYSICAL activity , *DEEP learning - Abstract
Mental illnesses such as depression are typically neurologically related psychological disorders that affect people’s mood, thinking and behavior. As the number of students who are concerned about their mental health continues to rise, depression has emerged as a mental health concern that has a significant impact on both students’ academic performance and overall lives. To identify depression in students at an earlier stage, the purpose of this study was to provide a potentially unique approach. An approach to the detection of mental illnesses that is based on deep learning networks is proposed in this paper. First, facial expression and physical activity data are utilized for detecting depression. Second, the transformer model is utilized to extract the characteristics of the individual’s physical behavior, and the multiregional attention network (MRAN) is utilized to extract the characteristics of the individual’s emotions. The information that is obtained from the two modalities is complementary. Finally, at the fusion stage, this work applies the classification prediction of depression and nondepression (normal) at the decision level. This is done to ensure that the respective modal properties that were learned by the two channels are preserved in their entirety. We have demonstrated that our strategy is highly effective by performing experimental validation using a dataset that we developed ourselves. It is possible to identify depression in children at an earlier stage with the help of this effective remedy. It is anticipated that the findings of this study will provide an efficient screening tool for depression to educational institutions and organizations that focus on mental health, hence assisting students in receiving essential assistance and intervention at an earlier stage. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Predicting the Remaining Life of Centrifugal Pump Bearings Using the KPCA–LSTM Algorithm.
- Author
-
Zhu, Rongsheng, Zhang, Xinyu, Huang, Qian, Li, Sihan, and Fu, Qiang
- Subjects
- *
LIFE cycles (Biology) , *CENTRIFUGAL pumps , *ROLLER bearings , *TIME series analysis , *PREDICTION models - Abstract
This paper proposes a data-driven prediction scheme for the remaining life of centrifugal pump bearings based on the KPCA–LSTM network. A centrifugal pump bearing fault experiment bench is built to collect data, and the performance of time domain, frequency domain, and time-frequency domain characteristics under different working conditions is analyzed. Time domain characteristics, frequency domain characteristics, wavelet packet decomposition energy characteristics, and CEEMDAN energy features are found to be able to capture fault information under different working conditions. Therefore, 43 sensitive features are determined from the time domain, frequency domain, and time-frequency domain. Through the analysis of XJTU-SY bearing life cycle data and based on the weighted scores of monotonicity, robustness, and trend indicators, twelve outstanding characteristics of the bearing in the whole life cycle are selected, and a one-dimensional feature quantity that can characterize the life-degradation process of the centrifugal pump bearing is constructed after KPCA dimension reduction processing. The LSTM network, sensitive to time series, is selected to predict and analyze the constructed one-dimensional feature trend, and the prediction effects of the BP network and the CNN network are compared. The results show that this method has advantages in prediction accuracy and model training time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Human Multi-Activities Classification Using mmWave Radar: Feature Fusion in Time-Domain and PCANet.
- Author
-
Lin, Yier, Li, Haobo, and Faccio, Daniele
- Subjects
- *
HUMAN activity recognition , *PRINCIPAL components analysis , *RANDOM forest algorithms , *POINT cloud , *ACTIVITIES of daily living - Abstract
This study introduces an innovative approach by incorporating statistical offset features, range profiles, time–frequency analyses, and azimuth–range–time characteristics to effectively identify various human daily activities. Our technique utilizes nine feature vectors consisting of six statistical offset features and three principal component analysis network (PCANet) fusion attributes. These statistical offset features are derived from combined elevation and azimuth data, considering their spatial angle relationships. The fusion attributes are generated through concurrent 1D networks using CNN-BiLSTM. The process begins with the temporal fusion of 3D range–azimuth–time data, followed by PCANet integration. Subsequently, a conventional classification model is employed to categorize a range of actions. Our methodology was tested with 21,000 samples across fourteen categories of human daily activities, demonstrating the effectiveness of our proposed solution. The experimental outcomes highlight the superior robustness of our method, particularly when using the Margenau–Hill Spectrogram for time–frequency analysis. When employing a random forest classifier, our approach outperformed other classifiers in terms of classification efficacy, achieving an average sensitivity, precision, F1, specificity, and accuracy of 98.25%, 98.25%, 98.25%, 99.87%, and 99.75%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Semantic Segmentation Network Based on Adaptive Attention and Deep Fusion Utilizing a Multi-Scale Dilated Convolutional Pyramid.
- Author
-
Zhao, Shan, Wang, Zihao, Huo, Zhanqiang, and Zhang, Fukai
- Subjects
- *
FEATURE selection , *FEATURE extraction , *PYRAMIDS , *DEEP learning - Abstract
Deep learning has recently made significant progress in semantic segmentation. However, the current methods face critical challenges. The segmentation process often lacks sufficient contextual information and attention mechanisms, low-level features lack semantic richness, and high-level features suffer from poor resolution. These limitations reduce the model's ability to accurately understand and process scene details, particularly in complex scenarios, leading to segmentation outputs that may have inaccuracies in boundary delineation, misclassification of regions, and poor handling of small or overlapping objects. To address these challenges, this paper proposes a Semantic Segmentation Network Based on Adaptive Attention and Deep Fusion with the Multi-Scale Dilated Convolutional Pyramid (SDAMNet). Specifically, the Dilated Convolutional Atrous Spatial Pyramid Pooling (DCASPP) module is developed to enhance contextual information in semantic segmentation. Additionally, a Semantic Channel Space Details Module (SCSDM) is devised to improve the extraction of significant features through multi-scale feature fusion and adaptive feature selection, enhancing the model's perceptual capability for key regions and optimizing semantic understanding and segmentation performance. Furthermore, a Semantic Features Fusion Module (SFFM) is constructed to address the semantic deficiency in low-level features and the low resolution in high-level features. The effectiveness of SDAMNet is demonstrated on two datasets, revealing significant improvements in Mean Intersection over Union (MIOU) by 2.89% and 2.13%, respectively, compared to the Deeplabv3+ network. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation.
- Author
-
Zhao, Shan, Zhao, Xin, Huo, Zhanqiang, and Zhang, Fukai
- Subjects
- *
SPACE perception , *DATA mining , *FEATURE extraction , *PYRAMIDS , *SPEED - Abstract
Most real-time semantic segmentation networks use shallow architectures to achieve fast inference speeds. This approach, however, limits a network's receptive field. Concurrently, feature information extraction is restricted to a single scale, which reduces the network's ability to generalize and maintain robustness. Furthermore, loss of image spatial details negatively impacts segmentation accuracy. To address these limitations, this paper proposes a Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network (BMSeNet). First, to address the limitation of singular semantic feature scales, a Multiscale Context Pyramid Pooling Module (MSCPPM) is introduced. By leveraging various pooling operations, this module efficiently enlarges the receptive field and better aggregates multiscale contextual information. Moreover, a Spatial Detail Enhancement Module (SDEM) is designed, to effectively compensate for lost spatial detail information and significantly enhance the perception of spatial details. Finally, a Bilateral Attention Fusion Module (BAFM) is proposed. This module leverages pixel positional correlations to guide the network in assigning appropriate weights to the features extracted from the two branches, effectively merging the feature information of both branches. Extensive experiments were conducted on the Cityscapes and CamVid datasets. Experimental results show that the proposed BMSeNet achieves a good balance between inference speed and segmentation accuracy, outperforming some state-of-the-art real-time semantic segmentation methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. SOD-YOLO: Small-Object-Detection Algorithm Based on Improved YOLOv8 for UAV Images.
- Author
-
Li, Yangang, Li, Qi, Pan, Jie, Zhou, Ying, Zhu, Hongliang, Wei, Hongwei, and Liu, Chong
- Subjects
- *
DRONE aircraft , *COMPUTATIONAL complexity , *ALGORITHMS , *NECK - Abstract
The rapid development of unmanned aerial vehicle (UAV) technology has contributed to the increasing sophistication of UAV-based object-detection systems, which are now extensively utilized in civilian and military sectors. However, object detection from UAV images has numerous challenges, including significant variations in the object size, changing spatial configurations, and cluttered backgrounds with multiple interfering elements. To address these challenges, we propose SOD-YOLO, an innovative model based on the YOLOv8 model, to detect small objects in UAV images. The model integrates the receptive field convolutional block attention module (RFCBAM) in the backbone network to perform downsampling, improving feature extraction efficiency and mitigating the spatial information sparsity caused by downsampling. Additionally, we developed a novel neck architecture called the balanced spatial and semantic information fusion pyramid network (BSSI-FPN) designed for multi-scale feature fusion. The BSSI-FPN effectively balances spatial and semantic information across feature maps using three primary strategies: fully utilizing large-scale features, increasing the frequency of multi-scale feature fusion, and implementing dynamic upsampling. The experimental results on the VisDrone2019 dataset demonstrate that SOD-YOLO-s improves the mAP50 indicator by 3% compared to YOLOv8s while reducing the number of parameters and computational complexity by 84.2% and 30%, respectively. Compared to YOLOv8l, SOD-YOLO-l improves the mAP50 indicator by 7.7% and reduces the number of parameters by 59.6%. Compared to other existing methods, SODA-YOLO-l achieves the highest detection accuracy, demonstrating the superiority of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Enhanced Dual-Channel Model-Based with Improved Unet++ Network for Landslide Monitoring and Region Extraction in Remote Sensing Images.
- Author
-
Wang, Junxin, Zhang, Qintong, Xie, Hao, Chen, Yingying, and Sun, Rui
- Subjects
- *
CONVOLUTIONAL neural networks , *EMERGENCY management , *LANDSLIDES , *DEEP learning , *FEATURE extraction , *ENVIRONMENTAL disasters - Abstract
Landslide disasters pose significant threats to human life and property; therefore, accurate and effective detection and area extraction methods are crucial in environmental monitoring and disaster management. In our study, we address the critical tasks of landslide detection and area extraction in remote sensing images using advanced deep learning techniques. For landslide detection, we propose an enhanced dual-channel model that leverages EfficientNetB7 for feature extraction and incorporates spatial attention mechanisms (SAMs) to enhance important features. Additionally, we utilize a deep separable convolutional neural network with a Transformers module for feature extraction from digital elevation data (DEM). The extracted features are then fused using a variational autoencoder (VAE) to mine potential features and produce final classification results. Experimental results demonstrate impressive accuracy rates of 98.92% on the Bijie City landslide dataset and 94.70% on the Landslide4Sense dataset. For landslide area extraction, we enhance the traditional Unet++ architecture by incorporating Dilated Convolution to expand the receptive field and enable multi-scale feature extraction. We further integrate the Transformer and Convolutional Block Attention Module to enhance feature focus and introduce multi-task learning, including segmentation and edge detection tasks, to efficiently extract and refine landslide areas. Additionally, conditional random fields (CRFs) are applied for post-processing to refine segmentation boundaries. Comparative analysis demonstrates the superior performance of our proposed model over traditional segmentation models such as Unet, Fully Convolutional Network (FCN), and Segnet, as evidenced by improved metrics: IoU of 0.8631, Dice coefficient of 0.9265, overall accuracy (OA) of 91.53%, and Cohen's kappa coefficient of 0.9185 on the Bijie City landslide dataset; and IoU of 0.8217, Dice coefficient of 0.9021, overall accuracy (OA) of 96.68%, and Cohen's kappa coefficient of 0.8835 on the Landslide4Sense dataset. These findings highlight the effectiveness and robustness of our proposed methodologies in addressing critical challenges in landslide detection and area extraction tasks, with significant implications for enhancing disaster management and risk assessment efforts in remote sensing applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Limited Sample Radar HRRP Recognition Using FWA-GAN.
- Author
-
Song, Yiheng, Zhang, Liang, and Wang, Yanhua
- Subjects
- *
ARTIFICIAL neural networks , *GENERATIVE adversarial networks , *RESEARCH personnel , *RADAR - Abstract
In radar High-Resolution Range Profile (HRRP) target recognition, the targets of interest are always non-cooperative, posing a significant challenge in acquiring sufficient samples. This limitation results in the prevalent issue of limited sample availability. To mitigate this problem, researchers have sought to integrate handcrafted features into deep neural networks, thereby augmenting the information content. Nevertheless, existing methodologies for fusing handcrafted and deep features often resort to simplistic addition or concatenation approaches, which fail to fully capitalize on the complementary strengths of both feature types. To address these shortcomings, this paper introduces a novel radar HRRP feature fusion technique grounded in the Feature Weight Assignment Generative Adversarial Network (FWA-GAN) framework. This method leverages the generative adversarial network architecture to facilitate feature fusion in an innovative manner. Specifically, it employs the Feature Weight Assignment Model (FWA) to adaptively assign attention weights to both handcrafted and deep features. This approach enables a more efficient utilization and seamless integration of both feature modalities, thereby enhancing the overall recognition performance under conditions of limited sample availability. As a result, the recognition rate increases by over 4% compared to other state-of-the-art methods on both the simulation and experimental datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. GaitSTAR: Spatial–Temporal Attention-Based Feature-Reweighting Architecture for Human Gait Recognition.
- Author
-
Bilal, Muhammad, Jianbiao, He, Mushtaq, Husnain, Asim, Muhammad, Ali, Gauhar, and ElAffendi, Mohammed
- Subjects
- *
COMPUTER vision , *DEEP learning , *FEATURE extraction , *BIOMETRIC identification , *DISCRIMINANT analysis , *GAIT in humans - Abstract
Human gait recognition (HGR) leverages unique gait patterns to identify individuals, but the effectiveness of this technique can be hindered due to various factors such as carrying conditions, foot shadows, clothing variations, and changes in viewing angles. Traditional silhouette-based systems often neglect the critical role of instantaneous gait motion, which is essential for distinguishing individuals with similar features. We introduce the "Enhanced Gait Feature Extraction Framework (GaitSTAR)", a novel method that incorporates dynamic feature weighting through the discriminant analysis of temporal and spatial features within a channel-wise architecture. Key innovations in GaitSTAR include dynamic stride flow representation (DSFR) to address silhouette distortion, a transformer-based feature set transformation (FST) for integrating image-level features into set-level features, and dynamic feature reweighting (DFR) for capturing long-range interactions. DFR enhances contextual understanding and improves detection accuracy by computing attention distributions across channel dimensions. Empirical evaluations show that GaitSTAR achieves impressive accuracies of 98.5%, 98.0%, and 92.7% under NM, BG, and CL conditions, respectively, with the CASIA-B dataset; 67.3% with the CASIA-C dataset; and 54.21% with the Gait3D dataset. Despite its complexity, GaitSTAR demonstrates a favorable balance between accuracy and computational efficiency, making it a powerful tool for biometric identification based on gait patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. 多尺度特征融合的机场跑道异物检测与识别算法.
- Author
-
郭晓静 and 邹松林
- Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
38. Low-light enhancement method with dual branch feature fusion and learnable regularized attention.
- Author
-
Sun, Yixiang, Ni, Mengyao, Zhao, Ming, Yang, Zhenyu, Peng, Yuanlong, and Cao, Danhua
- Abstract
Restricted by the lighting conditions, the images captured at night tend to suffer from color aberration, noise, and other unfavorable factors, making it difficult for subsequent vision-based applications. To solve this problem, we propose a two-stage size-controllable low-light enhancement method, named Dual Fusion Enhancement Net (DFEN). The whole algorithm is built on a double U-Net structure, implementing brightness adjustment and detail revision respectively. A dual branch feature fusion module is adopted to enhance its ability of feature extraction and aggregation. We also design a learnable regularized attention module to balance the enhancement effect on different regions. Besides, we introduce a cosine training strategy to smooth the transition of the training target from the brightness adjustment stage to the detail revision stage during the training process. The proposed DFEN is tested on several low-light datasets, and the experimental results demonstrate that the algorithm achieves superior enhancement results with the similar parameters. It is worth noting that the lightest DFEN model reaches 11 FPS for image size of 1224×1024 in an RTX 3090 GPU. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. A Semi-Supervised Lie Detection Algorithm Based on Integrating Multiple Speech Emotional Features.
- Author
-
Xi, Ji, Yu, Hang, Xu, Zhe, Zhao, Li, and Tao, Huawei
- Subjects
LIE detectors & detection ,DEEP learning ,SPEECH ,DECEPTION ,SPECTROGRAMS - Abstract
When people tell lies, they often exhibit tension and emotional fluctuations, reflecting a complex psychological state. However, the scarcity of labeled data in datasets and the complexity of deception information pose significant challenges in extracting effective lie features, which severely restrict the accuracy of lie detection systems. To address this, this paper proposes a semi-supervised lie detection algorithm based on integrating multiple speech emotional features. Firstly, Long Short-Term Memory (LSTM) and Auto Encoder (AE) network process log Mel spectrogram features and acoustic statistical features, respectively, to capture the contextual links between similar features. Secondly, the joint attention model is used to learn the complementary relationship among different features to obtain feature representations with richer details. Lastly, the model combines the unsupervised loss Local Maximum Mean Discrepancy (LMMD) and supervised loss Jefferys multi-loss optimization to enhance the classification performance. Experimental results show that the algorithm proposed in this paper achieves better performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Multiscale Tea Disease Detection with Channel–Spatial Attention.
- Author
-
Sun, Yange, Jiang, Mingyi, Guo, Huaping, Zhang, Li, Yao, Jianfeng, Wu, Fei, and Wu, Gaowei
- Abstract
Tea disease detection is crucial for improving the agricultural circular economy. Deep learning-based methods have been widely applied to this task, and the main idea of these methods is to extract multiscale coarse features of diseases using the backbone network and fuse these features through the neck for accurate disease detection. This paper proposes a novel tea disease detection method that enhances feature expression of the backbone network and the feature fusion capability of the neck: (1) constructing an inverted residual self-attention module as a backbone plugin to capture the long-distance dependencies of disease spots on the leaves; and (2) developing a channel–spatial attention module with residual connection in the neck network to enhance the contextual semantic information of fused features in disease images and eliminate complex background noise. For the second step, the proposed channel–spatial attention module uses Residual Channel Attention (RCA) to enhance inter-channel interactions, facilitating discrimination between disease spots and normal leaf regions, and employs spatial attention (SA) to enhance essential areas of tea diseases. Experimental results demonstrate that the proposed method achieved accuracy and mAP scores of 92.9% and 94.6%, respectively. In particular, this method demonstrated improvements of 6.4% in accuracy and 6.2% in mAP compared to the SSD model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. AM-ESRGAN: Super-Resolution Reconstruction of Ancient Murals Based on Attention Mechanism and Multi-Level Residual Network.
- Author
-
Xiao, Ci, Chen, Yajun, Sun, Chaoyue, You, Longxiang, and Li, Rongzhen
- Subjects
CONVOLUTIONAL neural networks ,FEATURE extraction ,IMAGE reconstruction ,HIGH resolution imaging ,DATA mining - Abstract
To address the issues of blurred edges and contours, insufficient extraction of low-frequency information, and unclear texture details in ancient murals, which lead to decreased ornamental value and limited research significance of the murals, this paper proposes a novel ancient mural super-resolution reconstruction method, based on an attention mechanism and a multi-level residual network, termed AM-ESRGAN. This network builds a module for Multi-Scale Dense Feature Fusion (MDFF) to adaptively fuse features at different levels for more complete structural information regarding the image. The deep feature extraction module is improved with a new Sim-RRDB module, which expands capacity without increasing complexity. Additionally, a Simple Parameter-Free Attention Module for Convolutional Neural Networks (SimAM) is introduced to address the issue of insufficient feature extraction in the nonlinear mapping process of image super-resolution reconstruction. A new feature refinement module (DEABlock) is added to extract image feature information without changing the resolution, thereby avoiding excessive loss of image information and ensuring richer generated details. The experimental results indicate that the proposed method improves PSNR/dB by 3.4738 dB, SSIM by 0.2060, MSE by 123.8436, and NIQE by 0.1651 at a × 4 scale factor. At a × 2 scale factor, PSNR/dB improves by 4.0280 dB, SSIM increases by 3.38%, MSE decreases by 62.2746, and NIQE reduces by 0.1242. Compared to mainstream models, the objective evaluation metrics of the reconstructed images achieve the best results, and the reconstructed ancient mural images exhibit more detailed textures and clearer edges. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Highly accurate brain tumor detection with high sensitivity using transform-based functions and machine learning algorithms.
- Author
-
Bhatt, Ashish and Nigam, Vineeta Saxena
- Subjects
- *
MACHINE learning , *OPTIMIZATION algorithms , *CLASSIFICATION algorithms , *FEATURE selection , *FEATURE extraction , *BRAIN tumors - Abstract
Brain tumor is an extremely dangerous disease with a very high mortality rate worldwide. Detecting brain tumors accurately is crucial due to the varying appearance of tumor cells and the dimensional irregularities in their growth. This poses a significant challenge for detection algorithms. Currently, there are numerous algorithms utilized for this purpose, ranging from transform-based methods to those rooted in machine learning techniques. These algorithms aim to enhance the accuracy of detection despite the complexities involved in identifying brain tumor cells. The major limitation of these algorithms is the mapping of extracted features of a brain tumor in the classification algorithms. To employ a combination of transform methods to extract texture feature from brain tumor images. This paper employs a combination of transform methods based on sub band decomposition for texture feature extraction from MRI scans, hybrid feature optimization methods using firefly and glow-worm algorithms for selection of feature, employment of MKSVM algorithm and stacking ensemble classifier for classification and application of the feature of fusion of different feature extraction methods. The algorithm under consideration has been put into practice using MATLAB, utilizing datasets from BRATS (Brain Tumor Segmentation) for the years 2013, 2015, and 2018. These datasets serve as the foundation for testing and validating the algorithm’s performance across different time periods, providing a comprehensive assessment of its effectiveness in detecting brain tumors. The proposed algorithm achieves maximum detection accuracy, detection sensitivity and specificity up to 98%, 99% and 99.5% respectively. The experimental outcomes showcase the efficiency of the algorithm in detection of brain tumor. The proposed work mainly contributes in brain tumor detection in the following aspects: a) use of combination of transform methods for texture feature extraction from MRI scans b) hybrid feature selection methods using firefly and glow-worm optimization algorithms for selection of feature c) employment of MKSVM algorithm and stacking ensemble classifier for classification and application of the feature of fusion of different feature extraction methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Automatic seismic first‐break picking based on multi‐view feature fusion network.
- Author
-
Wu, Yinghe, Pan, Shulin, Lan, Haiqiang, Badal, José, Wei, Ze, and Chen, Yaojie
- Subjects
- *
ARTIFICIAL intelligence , *WORK design , *ELECTRONIC data processing , *GENERALIZATION , *ALGORITHMS - Abstract
Automatic first‐break picking is a basic step in seismic data processing, so much so that the quality of the picking largely determines the effect of subsequent processing. To a certain extent, artificial intelligence technology has solved the shortcomings of traditional first‐break picking algorithms, such as poor applicability and low efficiency. However, some problems still remain for seismic data, with a low signal‐to‐noise ratio and large first‐break change leading to inaccurate picking and poor generalization of the network. In order to improve the accuracy of the automatic first‐break picking results of the above seismic data, we propose a multi‐view automatic first‐break picking method driven by multi‐network. First, we analysed the single‐trace boundary characteristics and the two‐dimensional boundary characteristics of the first break. Based on these two characteristics of the first break, we used the Long Short‐Term Memory and the ResNet attention gate UNet (resudual attention gate UNet) networks to extract the characteristics of the first arrival and its location from the seismic data, respectively. Then, we introduced the idea of multi‐network learning in the first‐break picking work and designed a feature fusion network. Finally, the multi‐view first‐break features extracted by the Long Short‐Term Memory and resudual attention gate UNet networks are fused, which effectively improves the picking accuracy. The results obtained after applying the method to field seismic data show that the accuracy of the first break detected by a feature fusion network is higher than that given by the above two networks alone and has good applicability and resistance to noise. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Attentive context and semantic enhancement mechanism for printed circuit board defect detection with two-stage and multi-stage object detectors.
- Author
-
Kiobya, Twahir, Zhou, Junfeng, Maiseli, Baraka, and Khan, Maqbool
- Subjects
- *
PRINTED circuits , *ELECTRONIC equipment , *DETECTORS - Abstract
Printed Circuit Boards (PCBs) are key devices for the modern-day electronic technologies. During the production of these boards, defects may occur. Several methods have been proposed to detect PCB defects. However, detecting significantly smaller and visually unrecognizable defects has been a long-standing challenge. The existing two-stage and multi-stage object detectors that use only one layer of the backbone, such as Resnet's third layer ( C 4 ) or fourth layer ( C 5 ), suffer from low accuracy, and those that use multi-layer feature maps extractors, such as Feature Pyramid Network (FPN), incur higher computational cost. Founded by these challenges, we propose a robust, less computationally intensive, and plug-and-play Attentive Context and Semantic Enhancement Module (ACASEM) for two-stage and multi-stage detectors to enhance PCB defects detection. This module consists of two main parts, namely adaptable feature fusion and attention sub-modules. The proposed model, ACASEM, takes in feature maps from different layers of the backbone and fuses them in a way that enriches the resulting feature maps with more context and semantic information. We test our module with state-of-the-art two-stage object detectors, Faster R-CNN and Double-Head R-CNN, and with multi-stage Cascade R-CNN detector on DeepPCB and Augmented PCB Defect datasets. Empirical results demonstrate improvement in the accuracy of defect detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. A Divide-and-Rule Combined Learning Method for Truly Multivariate Time Series Prediction.
- Author
-
Wei, Bing, Sang, Shiqing, Yao, Liangyong, Gao, Lei, Liu, Yan, Han, Tao, and Li, Jintao
- Subjects
- *
TIME series analysis , *DEEP learning , *FORECASTING , *COOPERATION - Abstract
Multivariate time series prediction is a significant research area that aims to forecast future values based on past observations. Deep learning models with attention mechanisms have shown good predictive performance by emphasizing optimal-related sequences in the target series. However, these models ignore mutation information of nontarget sequences and the long short-term dependencies. To this end, a divide-and-rule combined learning method is proposed to address these limitations, which uses differentiated feature extractors to process different implicit features. First, we design a spatial and temporal information extractor to extract the time-dimensional feature information in the separation stage. Then, a multivariate mutation information extractor is constructed by convolution and maximum pooling layer to capture mutation information of nontarget sequences. Subsequently, the decoder component of the encoder-decoder model extracts long short-term dependencies while preserving the information of the target sequence to be predicted. Finally, in the cooperation stage, a feature fusion method based on a point attention mechanism is proposed, which can assign individual weights to each feature point and enhance the ability to focus on local areas. Experimental results on five real datasets in different domains show that the proposed method has better predictive performance compared to other baseline models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Research into the Applications of a Multi-Scale Feature Fusion Model in the Recognition of Abnormal Human Behavior.
- Author
-
Li, Congcong, Li, Yifan, Wang, Bin, and Zhang, Yuting
- Subjects
- *
HUMAN behavior , *OLDER people , *MODERN society , *MULTISCALE modeling , *POPULATION aging - Abstract
Due to the increasing severity of aging populations in modern society, the accurate and timely identification of, and responses to, sudden abnormal behaviors of the elderly have become an urgent and important issue. In the current research on computer vision-based abnormal behavior recognition, most algorithms have shown poor generalization and recognition abilities in practical applications, as well as issues with recognizing single actions. To address these problems, an MSCS–DenseNet–LSTM model based on a multi-scale attention mechanism is proposed. This model integrates the MSCS (Multi-Scale Convolutional Structure) module into the initial convolutional layer of the DenseNet model to form a multi-scale convolution structure. It introduces the improved Inception X module into the Dense Block to form an Inception Dense structure, and gradually performs feature fusion through each Dense Block module. The CBAM attention mechanism module is added to the dual-layer LSTM to enhance the model's generalization ability while ensuring the accurate recognition of abnormal actions. Furthermore, to address the issue of single-action abnormal behavior datasets, the RGB image dataset RIDS (RGB image dataset) and the contour image dataset CIDS (contour image dataset) containing various abnormal behaviors were constructed. The experimental results validate that the proposed MSCS–DenseNet–LSTM model achieved an accuracy, sensitivity, and specificity of 98.80%, 98.75%, and 98.82% on the two datasets, and 98.30%, 98.28%, and 98.38%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Accurate UAV Small Object Detection Based on HRFPN and EfficentVMamba.
- Author
-
Wu, Shixiao, Lu, Xingyuan, Guo, Chengcheng, and Guo, Hong
- Subjects
- *
OBJECT recognition (Computer vision) , *FEATURE extraction , *DEEP learning , *PYRAMIDS , *ALGORITHMS - Abstract
(1) Background: Small objects in Unmanned Aerial Vehicle (UAV) images are often scattered throughout various regions of the image, such as the corners, and may be blocked by larger objects, as well as susceptible to image noise. Moreover, due to their small size, these objects occupy a limited area in the image, resulting in a scarcity of effective features for detection. (2) Methods: To address the detection of small objects in UAV imagery, we introduce a novel algorithm called High-Resolution Feature Pyramid Network Mamba-Based YOLO (HRMamba-YOLO). This algorithm leverages the strengths of a High-Resolution Network (HRNet), EfficientVMamba, and YOLOv8, integrating a Double Spatial Pyramid Pooling (Double SPP) module, an Efficient Mamba Module (EMM), and a Fusion Mamba Module (FMM) to enhance feature extraction and capture contextual information. Additionally, a new Multi-Scale Feature Fusion Network, High-Resolution Feature Pyramid Network (HRFPN), and FMM improved feature interactions and enhanced the performance of small object detection. (3) Results: For the VisDroneDET dataset, the proposed algorithm achieved a 4.4% higher Mean Average Precision (mAP) compared to YOLOv8-m. The experimental results showed that HRMamba achieved a mAP of 37.1%, surpassing YOLOv8-m by 3.8% (Dota1.5 dataset). For the UCAS_AOD dataset and the DIOR dataset, our model had a mAP 1.5% and 0.3% higher than the YOLOv8-m model, respectively. To be fair, all the models were trained without a pre-trained model. (4) Conclusions: This study not only highlights the exceptional performance and efficiency of HRMamba-YOLO in small object detection tasks but also provides innovative solutions and valuable insights for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. HP-YOLOv8: High-Precision Small Object Detection Algorithm for Remote Sensing Images.
- Author
-
Yao, Guangzhen, Zhu, Sandong, Zhang, Long, and Qi, Miao
- Subjects
- *
OBJECT recognition (Computer vision) , *REMOTE sensing , *ALGORITHMS , *NOISE , *BRASSIERES - Abstract
YOLOv8, as an efficient object detection method, can swiftly and precisely identify objects within images. However, traditional algorithms encounter difficulties when detecting small objects in remote sensing images, such as missing information, background noise, and interactions among multiple objects in complex scenes, which may affect performance. To tackle these challenges, we propose an enhanced algorithm optimized for detecting small objects in remote sensing images, named HP-YOLOv8. Firstly, we design the C2f-D-Mixer (C2f-DM) module as a replacement for the original C2f module. This module integrates both local and global information, significantly improving the ability to detect features of small objects. Secondly, we introduce a feature fusion technique based on attention mechanisms, named Bi-Level Routing Attention in Gated Feature Pyramid Network (BGFPN). This technique utilizes an efficient feature aggregation network and reparameterization technology to optimize information interaction between different scale feature maps, and through the Bi-Level Routing Attention (BRA) mechanism, it effectively captures critical feature information of small objects. Finally, we propose the Shape Mean Perpendicular Distance Intersection over Union (SMPDIoU) loss function. The method comprehensively considers the shape and size of detection boxes, enhances the model's focus on the attributes of detection boxes, and provides a more accurate bounding box regression loss calculation method. To demonstrate our approach's efficacy, we conducted comprehensive experiments across the RSOD, NWPU VHR-10, and VisDrone2019 datasets. The experimental results show that the HP-YOLOv8 achieves 95.11%, 93.05%, and 53.49% in the mAP@0.5 metric, and 72.03%, 65.37%, and 38.91% in the more stringent mAP@0.5:0.95 metric, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. CMFPNet: A Cross-Modal Multidimensional Frequency Perception Network for Extracting Offshore Aquaculture Areas from MSI and SAR Images.
- Author
-
Yu, Haomiao, Wang, Fangxiong, Hou, Yingzi, Wang, Junfu, Zhu, Jianfeng, and Cui, Zhenqi
- Subjects
- *
MARINE resources conservation , *MULTISPECTRAL imaging , *REMOTE sensing , *RECOMMENDER systems , *SYNTHETIC aperture radar ,ENVIRONMENTAL protection planning - Abstract
The accurate extraction and monitoring of offshore aquaculture areas are crucial for the marine economy, environmental management, and sustainable development. Existing methods relying on unimodal remote sensing images are limited by natural conditions and sensor characteristics. To address this issue, we integrated multispectral imaging (MSI) and synthetic aperture radar imaging (SAR) to overcome the limitations of single-modal images. We propose a cross-modal multidimensional frequency perception network (CMFPNet) to enhance classification and extraction accuracy. CMFPNet includes a local–global perception block (LGPB) for combining local and global semantic information and a multidimensional adaptive frequency filtering attention block (MAFFAB) that dynamically filters frequency-domain information that is beneficial for aquaculture area recognition. We constructed six typical offshore aquaculture datasets and compared CMFPNet with other models. The quantitative results showed that CMFPNet outperformed the existing methods in terms of classifying and extracting floating raft aquaculture (FRA) and cage aquaculture (CA), achieving mean intersection over union (mIoU), mean F1 score (mF1), and mean Kappa coefficient (mKappa) values of 87.66%, 93.41%, and 92.59%, respectively. Moreover, CMFPNet has low model complexity and successfully achieves a good balance between performance and the number of required parameters. Qualitative results indicate significant reductions in missed detections, false detections, and adhesion phenomena. Overall, CMFPNet demonstrates great potential for accurately extracting large-scale offshore aquaculture areas, providing effective data support for marine planning and environmental protection. Our code is available at Data Availability Statement section. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. LWSDNet: A Lightweight Wheat Scab Detection Network Based on UAV Remote Sensing Images.
- Author
-
Yin, Ning, Bao, Wenxia, Yang, Rongchao, Wang, Nian, and Liu, Wenqiang
- Subjects
- *
REMOTE-sensing images , *FIELD crops , *REMOTE sensing , *INFORMATION networks , *IMAGE intensifiers - Abstract
Wheat scab can reduce wheat yield and quality. Currently, unmanned aerial vehicles (UAVs) are widely used for monitoring field crops. However, UAV is constrained by limited computational resources on-board the platforms. In addition, compared to ground images, UAV images have complex backgrounds and smaller targets. Given the aforementioned challenges, this paper proposes a lightweight wheat scab detection network based on UAV. In addition, overlapping cropping and image contrast enhancement methods are designed to preprocess UAV remote-sensing images. Additionally, this work constructed a lightweight wheat scab detection network called LWSDNet using mixed deep convolution (MixConv) to monitor wheat scab in field environments. MixConv can significantly reduce the parameters of the LWSDNet network through depthwise convolution and pointwise convolution, and different sizes of kernels can extract rich scab features. In order to enable LWSDNet to extract more scab features, a scab feature enhancement module, which utilizes spatial attention and dilated convolution, is designed to improve the ability of the network to extract scab features. The MixConv adaptive feature fusion module is designed to accurately detect lesions of different sizes, fully utilizing the semantic and detailed information in the network to enable more accurate detection by LWSDNet. During the training process, a knowledge distillation strategy that integrates scab features and responses is employed to further improve the average precision of LWSDNet detection. Experimental results demonstrate that the average precision of LWSDNet in detecting wheat scab is 79.8%, which is higher than common object detection models and lightweight object detection models. The parameters of LWSDNet are only 3.2 million (M), generally lower than existing lightweight object detection networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.