Descriptor: "feature pyramid" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"feature pyramid"' showing total 418 results

Start Over Descriptor "feature pyramid"

418 results on '"feature pyramid"'

1. Rethinking Features-Fused-Pyramid-Neck for Object Detection

Author: Li, Hulin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

2. HistoNeXt: dual-mechanism feature pyramid network for cell nuclear segmentation and classification.

Author: Chen, Junxiao, Wang, Ruixue, Dong, Wei, He, Hua, and Wang, Shiyong
Abstract: Purpose: To develop an end-to-end convolutional neural network model for analyzing hematoxylin and eosin(H&E)-stained histological images, enhancing the performance and efficiency of nuclear segmentation and classification within the digital pathology workflow. Methods: We propose a dual-mechanism feature pyramid fusion technique that integrates nuclear segmentation and classification tasks to construct the HistoNeXt network model. HistoNeXt utilizes an encoder-decoder architecture, where the encoder, based on the advanced ConvNeXt convolutional framework, efficiently and accurately extracts multi-level abstract features from tissue images. These features are subsequently shared with the decoder. The decoder employs a dual-mechanism architecture: The first branch of the mechanism splits into two parallel paths for nuclear segmentation, producing nuclear pixel (NP) and horizontal and vertical distance (HV) predictions, while the second mechanism branch focuses on type prediction (TP). The NP and HV branches leverage densely connected blocks to facilitate layer-by-layer feature transmission and reuse, while the TP branch employs channel attention to adaptively focus on critical features. Comprehensive data augmentation including morphology-preserving geometric transformations and adaptive H&E channel adjustments was applied. To address class imbalance, type-aware sampling was applied. The model was evaluated on public tissue image datasets including CONSEP, PanNuke, CPM17, and KUMAR. The performance in nuclear segmentation was evaluated using the Dice Similarity Coefficient (DICE), the Aggregated Jaccard Index (AJI) and Panoptic Quality (PQ), and the classification performance was evaluated using F1 scores and category-specific F1 scores. In addition, computational complexity, measured in Giga Floating Point Operations Per Second (GFLOPS), was used as an indicator of resource consumption. Results: HistoNeXt demonstrated competitive performance across multiple datasets: achieving a DICE score of 0.874, an AJI of 0.722, and a PQ of 0.689 on the CPM17 dataset; a DICE score of 0.826, an AJI of 0.625, and a PQ of 0.565 on KUMAR; and performance comparable to Transformer-based models, such as CellViT-SAM-H, on PanNuke, with a binary PQ of 0.6794, a multi-class PQ of 0.4940, and an overall F1 score of 0.82. On the CONSEP dataset, it achieved a DICE score of 0.843, an AJI of 0.592, a PQ of 0.532, and an overall classification F1 score of 0.773. Specific F1 scores for various cell types were as follows: 0.653 for malignant or dysplastic epithelial cells, 0.516 for normal epithelial cells, 0.659 for inflammatory cells, and 0.587 for spindle cells. The tiny model's complexity was 33.7 GFLOPS. Conclusion: By integrating novel convolutional technology and employing a pyramid fusion of dual-mechanism characteristics, HistoNeXt enhances both the precision and efficiency of nuclear segmentation and classification. Its low computational complexity makes the model well suited for local deployment in resource-constrained environments, thereby supporting a broad spectrum of clinical and research applications. This represents a significant advance in the application of convolutional neural networks in digital pathology analysis. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

3. 基于 GAN 和多尺度空间注意力的多模态医学图像融合.

Author: 林予松, 李孟娅, 李英豪, and 赵哲
Abstract: Copyright of Journal of Zhengzhou University: Engineering Science is the property of Editorial Office of Journal of Zhengzhou University: Engineering Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2025
Full Text: View/download PDF

4. 联合双重注意力机制和双向特征金字塔的遥感影像小目标检测.

Author: 李, 科文, 朱, 光磊, 王, 辉, 祝, 锐, 狄, 兮尧, 张, 天健, and 薛, 朝辉
Subjects: DETECTION algorithms, REMOTE sensing, DEEP learning, FEATURE extraction, SPATIAL resolution
Abstract: Copyright of Journal of Remote Sensing is the property of Editorial Office of Journal of Remote Sensing & Science Publishing Co. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

5. A Small-Scale Object Detection Algorithm in Intelligent Transportation Scenarios.

Author: Song, Junzi, Han, Chunyan, and Wu, Chenni
Subjects: *OBJECT recognition (Computer vision), *K-means clustering, *ENTROPY (Information theory), *ALGORITHMS, *PYRAMIDS
Abstract: In response to the problem of poor detection ability of object detection models for small-scale targets in intelligent transportation scenarios, a fusion method is proposed to enhance the features of small-scale targets, starting from feature utilization and fusion methods. The algorithm is based on the YOLOv4 tiny framework and enhances the utilization of shallow and mid-level features on the basis of Feature Pyramid Network (FPN), improving the detection accuracy of small and medium-sized targets. In view of the problem that the background of the intelligent traffic scene image is cluttered, and there is more redundant information, the Convolutional Block Attention Module (CBAM) is used to improve the attention of the model to the traffic target. To address the problem of data imbalance and prior bounding box adaptation in custom traffic data sets that expand traffic images in COCO and VOC, we propose a Copy-Paste method with an improved generation method and a K-means algorithm with improved distance measurement to enhance the model's detection ability for corresponding categories. Comparative experiments were conducted on a customized 260-thousand traffic data set containing public traffic images, and the results showed that compared to YOLOv4 tiny, the proposed algorithm improved mAP by 4.9% while still ensuring the real-time performance of the model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. ChartLine: Automatic Detection and Tracing of Curves in Scientific Line Charts Using Spatial-Sequence Feature Pyramid Network.

Author: Yang, Wenjin, He, Jie, and Li, Qian
Subjects: *COMMERCIAL documents, *DATA extraction, *PYRAMIDS, *DATA visualization, *PLAGIARISM, *CURVES
Abstract: Line charts are prevalent in scientific documents and commercial data visualization, serving as essential tools for conveying data trends. Automatic detection and tracing of line paths in these charts is crucial for downstream tasks such as data extraction, chart quality assessment, plagiarism detection, and visual question answering. However, line graphs present unique challenges due to their complex backgrounds and diverse curve styles, including solid, dashed, and dotted lines. Existing curve detection algorithms struggle to address these challenges effectively. In this paper, we propose ChartLine, a novel network designed for detecting and tracing curves in line graphs. Our approach integrates a Spatial-Sequence Attention Feature Pyramid Network (SSA-FPN) in both the encoder and decoder to capture rich hierarchical representations of curve structures and boundary features. The model incorporates a Spatial-Sequence Fusion (SSF) module and a Channel Multi-Head Attention (CMA) module to enhance intra-class consistency and inter-class distinction. We evaluate ChartLine on four line chart datasets and compare its performance against state-of-the-art curve detection, edge detection, and semantic segmentation methods. Extensive experiments demonstrate that our method significantly outperforms existing algorithms, achieving an F-measure of 94% on a synthetic dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Network and Dataset for Multiscale Remote Sensing Image Change Detection

Author: Shenbo Liu, Dongxue Zhao, Yuheng Zhou, Ying Tan, Huang He, Zhao Zhang, and Lijun Tang
Subjects: Attention mechanism, change detection dataset (CDD), feature pyramid, multiscale change detection, remote sensing images, Ocean engineering, TC1501-1800, Geophysics. Cosmic physics, QC801-809
Abstract: Remote sensing image change detection (RSCD) aims to identify differences between remote sensing images of the same location at different times. However, due to the significant variations in the size and appearance of objects in real-world scenes, existing RSCD algorithms often lack strong capabilities in extracting multiscale features, thereby failing to fully capture the characteristics of changes. To address this issue, a multiscale remote sensing change detection network (MSNet) and a multiscale RSCD dataset (MSRS-CD) are proposed. A multiscale convolution module (MSCM) is investigated, and combined with MSCM, an encoder capable of capturing features of different sizes is designed to efficiently extract multiscale semantic change features. A global multiscale feature fusion module is designed to achieve global multiscale feature fusion and obtain multiscale high-level semantic change features. As existing RSCD datasets lack rich scale information and often focus on change targets of specific sizes, a new dataset, MSRS-CD, is constructed. This dataset consists of 842 pairs of images with a resolution of 1024 × 1024 pixels, featuring uniformly distributed change detection target sizes. Comparative experiments are conducted with 10 other state-of-the-art algorithms on the MSRS-CD dataset and another public dataset, LEVIR-CD. Experimental results demonstrate that MSNet achieves the best performance on both datasets, with an F1 score of 75.74% on the MSRS-CD dataset and 91.41% on the LEVIR-CD dataset.
Published: 2025
Full Text: View/download PDF

8. HistoNeXt: dual-mechanism feature pyramid network for cell nuclear segmentation and classification

Author: Junxiao Chen, Ruixue Wang, Wei Dong, Hua He, and Shiyong Wang
Subjects: Nuclear segmentation, Nuclear classification, Feature pyramid, Convolutional neural network, Digital pathology, Medical technology, R855-855.5
Abstract: Abstract Purpose To develop an end-to-end convolutional neural network model for analyzing hematoxylin and eosin(H&E)-stained histological images, enhancing the performance and efficiency of nuclear segmentation and classification within the digital pathology workflow. Methods We propose a dual-mechanism feature pyramid fusion technique that integrates nuclear segmentation and classification tasks to construct the HistoNeXt network model. HistoNeXt utilizes an encoder-decoder architecture, where the encoder, based on the advanced ConvNeXt convolutional framework, efficiently and accurately extracts multi-level abstract features from tissue images. These features are subsequently shared with the decoder. The decoder employs a dual-mechanism architecture: The first branch of the mechanism splits into two parallel paths for nuclear segmentation, producing nuclear pixel (NP) and horizontal and vertical distance (HV) predictions, while the second mechanism branch focuses on type prediction (TP). The NP and HV branches leverage densely connected blocks to facilitate layer-by-layer feature transmission and reuse, while the TP branch employs channel attention to adaptively focus on critical features. Comprehensive data augmentation including morphology-preserving geometric transformations and adaptive H&E channel adjustments was applied. To address class imbalance, type-aware sampling was applied. The model was evaluated on public tissue image datasets including CONSEP, PanNuke, CPM17, and KUMAR. The performance in nuclear segmentation was evaluated using the Dice Similarity Coefficient (DICE), the Aggregated Jaccard Index (AJI) and Panoptic Quality (PQ), and the classification performance was evaluated using F1 scores and category-specific F1 scores. In addition, computational complexity, measured in Giga Floating Point Operations Per Second (GFLOPS), was used as an indicator of resource consumption. Results HistoNeXt demonstrated competitive performance across multiple datasets: achieving a DICE score of 0.874, an AJI of 0.722, and a PQ of 0.689 on the CPM17 dataset; a DICE score of 0.826, an AJI of 0.625, and a PQ of 0.565 on KUMAR; and performance comparable to Transformer-based models, such as CellViT-SAM-H, on PanNuke, with a binary PQ of 0.6794, a multi-class PQ of 0.4940, and an overall F1 score of 0.82. On the CONSEP dataset, it achieved a DICE score of 0.843, an AJI of 0.592, a PQ of 0.532, and an overall classification F1 score of 0.773. Specific F1 scores for various cell types were as follows: 0.653 for malignant or dysplastic epithelial cells, 0.516 for normal epithelial cells, 0.659 for inflammatory cells, and 0.587 for spindle cells. The tiny model’s complexity was 33.7 GFLOPS. Conclusion By integrating novel convolutional technology and employing a pyramid fusion of dual-mechanism characteristics, HistoNeXt enhances both the precision and efficiency of nuclear segmentation and classification. Its low computational complexity makes the model well suited for local deployment in resource-constrained environments, thereby supporting a broad spectrum of clinical and research applications. This represents a significant advance in the application of convolutional neural networks in digital pathology analysis.
Published: 2025
Full Text: View/download PDF

9. Global remote feature modulation end-to-end detection

Author: XiaoAn Bao, WenJing Yi, XiaoMei Tu, Na Zhang, QingQi Zhang, YuTing Jin, and Biao Wu
Subjects: Object detection, Attention mechanism, Feature pyramid, Pig detection, Medicine, Science
Abstract: Abstract Object detector based on fully convolutional network achieves excellent performance. However, existing detection algorithms still face challenges such as low detection accuracy in dense scenes and issues with occlusion of dense targets. To address these two challenges, we propose an Global Remote Feature Modulation End-to-End (GRFME2E) detection algorithm. In the feature extraction phase of our algorithm, we introduces the Concentric Attention Feature Pyramid Network (CAFPN). The CAFPN captures direction-aware and position-sensitive information, as well as global remote dependencies of features in deep layers by combining Coordinate Attention and Multilayer Perceptron. These features are used to modulate the front-end shallow features, enhancing inter-layer feature adjustment to obtain comprehensive and distinctive feature representations.In the detector part, we introduce the Two-Stage Detection Head (TS Head). This head employs the First-One-to-Few (F-O2F) module to detect slightly or unobstructed objects. Additionally, it uses masks to suppress already detected instances, and then feeds them to the Second-One-to-Few (S-O2F) module to identify those that are heavily occluded. The results from both detection stages are merged to produce the final output, ensuring the detection of objects whether they are slightly obscured, unobstructed, or heavily occluded. Experimental results on the pig detection dataset demonstrate that our GRFME2E achieves an accuracy of 98.4%. In addition, more extensive experimental results show that on the CrowdHuman dataset, our GRFME2E achieves 91.8% and outperforms other methods.
Published: 2024
Full Text: View/download PDF

10. YOLO-ML: 基于多尺度特征层注意力机制的滑轨缺陷检测方法.

Author: 王月, 刘永旭, 王鹏, 银兴行, and 杨欢
Subjects: ON-site evaluation, PYRAMIDS, NOISE, AUTOMOBILES, PULLEYS
Abstract: Copyright of Journal of Chongqing University of Posts & Telecommunications (Natural Science Edition) is the property of Chongqing University of Posts & Telecommunications and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

11. Improving YOLOX network for multi-scale fire detection.

Author: Wang, Taofang, Wang, Jun, Wang, Chao, Lei, Yi, Cao, Rui, and Wang, Li
Subjects: *FALSE alarms, *CONVOLUTIONAL neural networks, *FIRE detectors, *FOREST fires, *FOREST protection, *DATA augmentation, *NATURAL disasters
Abstract: Forest fire is a severe natural disaster, which leads to the destruction of forest ecology. At present, fire detection technology represented by convolutional neural network is widely used in forest resource protection, which can realize rapid analysis. However, in forest flame and smoke detection tasks, due to continuous expansion of the target range, a better detection effect cannot be achieved. This paper proposes an improved YOLOX method for multi-scale forest fire detection. This method proposes a novel feature pyramid model to reduce the information loss of high-level forest fire feature maps and enhance the representation ability of feature pyramids. Moreover, the method applies a small object data augmentation strategy to enrich the forest fire dataset, making it more suitable for the actual forest fire scene. According to the experimental results, the mAP of the model proposed in this paper reaches 79.64%, which is about 4.89% higher than the baseline network YOLOX. The method improves the accuracy of forest fire detection, reduces false alarms, and is suitable for real scenarios of forest fires. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. 基于特征金字塔网络与树莓派的护理床智能控制方法研究.

Author: 杜特 and 宋扬
Abstract: Copyright of Computer Measurement & Control is the property of Magazine Agency of Computer Measurement & Control and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

13. Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume.

Author: Han, Ming, Yin, Hui, Chong, Aixin, and Du, Qianqian
Subjects: CASCADE connections, REFERENCE sources, PYRAMIDS, INTENTION, COST
Abstract: Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Global remote feature modulation end-to-end detection.

Author: Bao, XiaoAn, Yi, WenJing, Tu, XiaoMei, Zhang, Na, Zhang, QingQi, Jin, YuTing, and Wu, Biao
Abstract: Object detector based on fully convolutional network achieves excellent performance. However, existing detection algorithms still face challenges such as low detection accuracy in dense scenes and issues with occlusion of dense targets. To address these two challenges, we propose an Global Remote Feature Modulation End-to-End (GRFME2E) detection algorithm. In the feature extraction phase of our algorithm, we introduces the Concentric Attention Feature Pyramid Network (CAFPN). The CAFPN captures direction-aware and position-sensitive information, as well as global remote dependencies of features in deep layers by combining Coordinate Attention and Multilayer Perceptron. These features are used to modulate the front-end shallow features, enhancing inter-layer feature adjustment to obtain comprehensive and distinctive feature representations.In the detector part, we introduce the Two-Stage Detection Head (TS Head). This head employs the First-One-to-Few (F-O2F) module to detect slightly or unobstructed objects. Additionally, it uses masks to suppress already detected instances, and then feeds them to the Second-One-to-Few (S-O2F) module to identify those that are heavily occluded. The results from both detection stages are merged to produce the final output, ensuring the detection of objects whether they are slightly obscured, unobstructed, or heavily occluded. Experimental results on the pig detection dataset demonstrate that our GRFME2E achieves an accuracy of 98.4%. In addition, more extensive experimental results show that on the CrowdHuman dataset, our GRFME2E achieves 91.8% and outperforms other methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Occluded Face Recognition Based on Segmentation and Multi-stage Mask Learning

Author: ZHANG Zheng, LU Tianliang, CAO Jinxuan
Subjects: occluded face recognition, multi-stage mask learning, occlusion detection and segmentation, feature pyramid, Electronic computers. Computer science, QA75.5-76.95
Abstract: Existing face recognition methods cannot effectively eliminate the influence of corrupted features caused by occlusion. As the features flow deeper, the corrupted features get entangled with the effective features used for identity classification, which affects the recognition results. To address the problem, this paper designs an occluded face recognition method based on segmentation and multi-stage mask learning strategy. The model consists of three components: occlusion detection and segmentation, feature extraction, and mask learning unit. The proposed method only needs one end-to-end process to learn feature masks and deep occlusion-robust features without relying on additional occlusion detectors. The mask learning units take different sizes of occlusion segmentation representations and facial features of different stages as input, generate corresponding feature masks for different stages of feature extraction, and effectively eliminate the influence of corrupted features caused by occlusion at each stage of feature extraction through mask operations. Finally, a feature pyramid is constructed to fuse features of different stages for identity classification. Experimental results show that the proposed method can effectively improve the accuracy of occluded face recognition. The accuracy on the occluded LFW dataset and the real masked datasets MFR2 and Mask_whn reach 98.77%, 96.70% and 81.53%, respectively, which has an accuracy improvement of 2.04, 0.48 and 4.44 percentage points compared with the existing mainstream methods.
Published: 2024
Full Text: View/download PDF

16. Few-Shot Steel Defect Detection Based on a Fine-Tuned Network with Serial Multi-Scale Attention.

Author: Liu, Xiangpeng, Jiao, Lei, Peng, Yulin, An, Kang, Wang, Danning, Lu, Wei, and Han, Jianjiao
Subjects: MACHINE learning, STEEL, SUPERVISED learning, DEEP learning, SURFACE defects, VIRTUAL networks
Abstract: Detecting defects on a steel surface is crucial for the quality enhancement of steel, but its effectiveness is impeded by the limited number of high-quality samples, diverse defect types, and the presence of interference factors such as dirt spots. Therefore, this article proposes a fine-tuned deep learning approach to overcome these obstacles in unstructured few-shot settings. Initially, to address steel surface defect complexities, we integrated a serial multi-scale attention mechanism, concatenating attention and spatial modules, to generate feature maps that contain both channel information and spatial information. Further, a pseudo-label semi-supervised learning algorithm (SSL) based on a variant of the locally linear embedding (LLE) algorithm was proposed, enhancing the generalization capability of the model through information from unlabeled data. Afterwards, the refined model was merged into a fine-tuned few-shot object detection network, which applied extensive base class samples for initial training and sparsed new class samples for fine-tuning. Finally, specialized datasets considering defect diversity and pixel scales were constructed and tested. Compared with conventional methods, our approach improved accuracy by 5.93% in 7-shot detection tasks, markedly reducing manual workload and signifying a leap forward for practical applications in steel defect detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. MI-RPN: Integrating multi-modalities and multi-scales information for region proposal.

Author: Tian, Shishun, Chen, Ruifeng, Zou, Wenbin, and Li, Xia
Subjects: ADDITION (Mathematics), PYRAMIDS
Abstract: Region proposal is crucial for the two-stage object detectors. Recently, the RGB-based region proposal approaches have achieved impressive progress. However, they still suffer from two problems: (1) RGB images only contain the texture information of objects, while the 3D geometric structure information which is also important for detection is neglected. (2) in a typical Feature Pyramid Network (FPN), the upsampling operation only models the corresponding relation between adjacent locations, the texture structure is not taken into consideration. Besides, the addition operation in FPN ignores the importance of different channels which may affect the propagation of semantic information. In this paper, we propose a Region Proposal Network using Multi-modalities and multi-scales Information (named MI-RPN). Firstly, we propose a Gate-guided Fusion Module (GFM) to fuse the RGB and depth features which respectively contain the texture and geometric information. Secondly, we propose a Flow-guided Upsample Feature Pyramid Network (FUFPN) to optimize the multi-scales feature fusion in typical FPN by taking features of an adjacent layer into consideration. Experimental results on SUNRGBD, NYUv2, and KITTI show that MI-RPN achieves superior results compared to current state-of-the-art methods. Besides, we replace the RPN in typical two-stage object detection models to test the effectiveness of the proposed MI-RPN. The results show that MI-RPN can significantly improve the accuracy of two-stage object detection models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Optimal path for automated pedestrian detection: image deblurring algorithm based on generative adversarial network.

Author: Xiujuan Dong and Jianping Lan
Subjects: *GENERATIVE adversarial networks, *PEDESTRIANS, *SIGNAL-to-noise ratio, *ALGORITHMS
Abstract: The pedestrian detection technology of automated driving is also facing some challenges. Aiming at the problem of specific target deblurring in the image, this research built a pedestrian detection deblurring model in view of Generative adversarial network and multi-scale convolution. First, it designs an image deblurring algorithm in view of Generative adversarial network. Then, on the basis of image deblurring, a pedestrian deblurring algorithm in view of multi-scale convolution is designed to focus on deblurring the pedestrians in the image. The outcomes showcase that the peak signal to noise ratio and structural similarity index of the image deblurring algorithm in view of the Generative adversarial network are the highest, which are 29.7 dB and 0.943 dB respectively, and the operation time is the shortest, which is 0.50 s. The pedestrian deblurring algorithm in view of multi-scale convolution has the highest peak signal-tonoise ratio (PSNR) and structural similarity indicators in the HIDE test set and GoPro dataset, with 29.4 dB and 0.925 dB, 40.45 dB and 0.992 dB, respectively. The resulting restored image is the clearest and possesses the best visual effect. The enlarged part of the face can reveal more detailed information, and it is the closest to a real clear image. The deblurring effect is not limited to the size of the pedestrians in the image. In summary, the model constructed in this study has good application effects in image deblurring and pedestrian detection, and has a certain promoting effect on the development of autonomous driving technology. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. AP-Net: a metallic surface defect detection approach with lightweight adaptive attention and enhanced feature pyramid.

Author: Chen, Faquan, Deng, Miaolei, Gao, Hui, Yang, Xiaoya, and Zhang, Dexian
Subjects: *SURFACE defects, *METALLIC surfaces, *PYRAMIDS, *OBJECT recognition (Computer vision), *DETECTORS
Abstract: Surface defect detection is essential for ensuring the quality of metallic products. Many excellent surface defect detectors have been designed in recent years. Most detection methods achieve success by using attention and feature pyramid modules. However, most attention modules consider only simple global information in channel features and incur heavy computational costs. Furthermore, the existing feature pyramids fail to effectively utilize the information from all multi-level features. To alleviate these issues, we design a metallic surface defect detector, named AP-Net. Specifically, we propose a lightweight adaptive attention module (LAA) and an enhanced feature pyramid module (EFP). We perform extensive experiments on three datasets. The experimental results show that the detection accuracies of the representative detectors can be significantly improved using our LAA and EFP. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. WH-DETR: An Efficient Network Architecture for Wheat Spike Detection in Complex Backgrounds.

Author: Yang, Zhenlin, Yang, Wanhong, Yi, Jizheng, and Liu, Rong
Subjects: WHEAT, DATA augmentation, TRANSFORMER models, PRECISION farming, PYRAMIDS
Abstract: Wheat spike detection is crucial for estimating wheat yields and has a significant impact on the modernization of wheat cultivation and the advancement of precision agriculture. This study explores the application of the DETR (Detection Transformer) architecture in wheat spike detection, introducing a new perspective to this task. We propose a high-precision end-to-end network named WH-DETR, which is based on an enhanced RT-DETR architecture. Initially, we employ data augmentation techniques such as image rotation, scaling, and random occlusion on the GWHD2021 dataset to improve the model's generalization across various scenarios. A lightweight feature pyramid, GS-BiFPN, is implemented in the network's neck section to effectively extract the multi-scale features of wheat spikes in complex environments, such as those with occlusions, overlaps, and extreme lighting conditions. Additionally, the introduction of GSConv enhances the network precision while reducing the computational costs, thereby controlling the detection speed. Furthermore, the EIoU metric is integrated into the loss function, refined to better focus on partially occluded or overlapping spikes. The testing results on the dataset demonstrate that this method achieves an Average Precision (AP) of 95.7%, surpassing current state-of-the-art object detection methods in both precision and speed. These findings confirm that our approach more closely meets the practical requirements for wheat spike detection compared to existing methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Active phase recognition method of hydrogenation catalyst based on multi-feature fusion Mask CenterNet.

Author: Wang, Zhujun, Sun, Tianhe, Li, Haobin, Cui, Ailin, and Bao, Song
Subjects: *IMAGE recognition (Computer vision), *ELECTRON microscopes, *FEATURE extraction, *CATALYSTS, *LEAKAGE
Abstract: In order to realize the intelligent recognition and statistics of hydrogenation catalyst image information, this paper presents a new method to judge the active phase by image recognition, which is different from traditional methods. Firstly, considering that hydrogenation catalyst image targets are small and easy to stack, the feature extraction network in the CenterNet model is optimized by adding the multi-feature fusion module to improve the accuracy of the network in edge positioning. Secondly, according to the linear shape of the hydrogenation catalyst, the mask branch is added to the CenterNet model to train the hydrogenation catalyst stripes with unclear target to reduce the leakage rate of the hydrogenation catalyst. The experimental results show that the detection accuracy of the improved CenterNet network is 91 % , 7 % higher than that of the original one, with a decline in detection rate by 12 % . The method proposed in this paper can accurately identify and segment the hydrogenation catalyst in the electron microscope image, which can provide technical support for the statistics and analysis of the hydrogenation catalyst image. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. U-Net with Coordinate Attention and VGGNet: A Grape Image Segmentation Algorithm Based on Fusion Pyramid Pooling and the Dual-Attention Mechanism.

Author: Yi, Xiaomei, Zhou, Yue, Wu, Peng, Wang, Guoying, Mo, Lufeng, Chola, Musenge, Fu, Xinyun, and Qian, Pengxiang
Subjects: *IMAGE segmentation, *PYRAMIDS, *ALGORITHMS, *LEAF spots, *GRAPE yields
Abstract: Currently, the classification of grapevine black rot disease relies on assessing the percentage of affected spots in the total area, with a primary focus on accurately segmenting these spots in images. Particularly challenging are cases in which lesion areas are small and boundaries are ill-defined, hampering precise segmentation. In our study, we introduce an enhanced U-Net network tailored for segmenting black rot spots on grape leaves. Leveraging VGG as the U-Net's backbone, we strategically position the atrous spatial pyramid pooling (ASPP) module at the base of the U-Net to serve as a link between the encoder and decoder. Additionally, channel and spatial dual-attention modules are integrated into the decoder, alongside a feature pyramid network aimed at fusing diverse levels of feature maps to enhance the segmentation of diseased regions. Our model outperforms traditional plant disease semantic segmentation approaches like DeeplabV3+, U-Net, and PSPNet, achieving impressive pixel accuracy (PA) and mean intersection over union (MIoU) scores of 94.33% and 91.09%, respectively. Demonstrating strong performance across various levels of spot segmentation, our method showcases its efficacy in enhancing the segmentation accuracy of black rot spots on grapevines. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. POD-YOLO Object Detection Model Based on Bi-directional Dynamic Cross-level Pyramid Network.

Author: Yu Zhang, Ming Ma, Zhongxiang Wang, Jing Li, and Yan Sun
Subjects: *OBJECT recognition (Computer vision), *PYRAMIDS, *SPINE, *INFORMATION networks, *IMAGE processing
Abstract: The existing heavy-backbone object detection models overlook the crucial role of cross-level interactive fusion of feature information in pyramid networks, resulting in the inability to detect occluded objects or small objects in complex scenes. In this thesis, we present a new heavy-neck object detection model called POD-YOLO based on YOLOv5s. Firstly, we propose the POD-RepC3 module to increase the model's capability to obtain the multi-layer feature. Additionally, addressing the issue of large object size span, we propose a bidirectional partial dynamic fusion module (Bi-PDC) as the detection neck of the pyramid network. This module preserves the accurate positioning signals and facilitates cross-level interactive fusion of feature information. Finally, we design Reparameterized Bi-directional Dynamic Feature Pyramid Network (RepBi-DFPN), a deep feature fusion network that integrates contextual information and enhances both feature expression and fusion capabilities of our model. The experiment results suggest that the suggested method is positive on the PASCAL VOC dataset. The mAP@0.5 and mAP@0.5:0.95 performance reached 81.3% and 58.2%, respectively, which increased by 2.4% and 4.1% compared to original algorithm YOLOv5s. Furthermore, experiment results also demonstrate that model's performance can compete with SOTA object detection models. In this paper, the algorithm optimizes the feature fusion capability of the pyramid network to effectively decrease the false detection and missing detection of the model. The model's ability to accurately detect multi-scale targets is significantly improved. [ABSTRACT FROM AUTHOR]
Published: 2024

24. Yaru3DFPN: a lightweight modified 3D UNet with feature pyramid network and combine thresholding for brain tumor segmentation.

Author: Akbar, Agus Subhan, Fatichah, Chastine, Suciati, Nanik, and Za'in, Choiru
Subjects: *BRAIN tumors, *DEEP learning, *PYRAMIDS, *MAGNETIC resonance imaging, *SURVIVAL rate, *THREE-dimensional imaging
Abstract: Gliomas are the most common and aggressive form of all brain tumors, with a median survival rate of fewer than two years, especially for the highest-grade glioma patient. Accurate and reproducible brain tumor segmentation is essential for an effective treatment plan and diagnosis to reduce the risk of further spread. Automated brain tumor segmentation is challenging because it can appear in the brain with variations in shape, size, and position from one patient to another. Several deep learning architectures have been created to handle automatic segmentation with good performance results on 3D MRI images. However, these architectures are generally large and require high hardware specifications and a large amount of memory and storage. This paper proposes a lightweight modified 3D UNet architecture with an outstanding performance level called Yaru3DFPN. The architecture is built based on the UNet. The block used is ResNet and is modified to use pre-activation strategies and GroupNormalization for batch normalization. In the expanding section, features are arranged into pyramid features. The final output is thresholded using the combining thresholding method. This architecture is light and fast. This proposal was tested using BraTS datasets with the highest dice performance of 80.90%, 86.27%, and 92.02% for ET, TC, and WT areas, respectively. This result outperformed all other comparative architectures and promised to be developed for clinical application. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. CamGNN: Cascade Graph Neural Network for Camera Re-Localization.

Author: Wang, Li, Jia, Jiale, Dai, Hualin, and Li, Guoyan
Subjects: GRAPH neural networks, CAMERAS, FEATURE extraction, LOCALIZATION (Mathematics), IMAGE representation
Abstract: In response to the inaccurate positioning of traditional camera relocation methods in scenes with large-scale or severe viewpoint changes, this study proposes a camera relocation method based on a cascaded graph neural network to achieve accurate scene relocation. Firstly, the NetVLAD retrieval method, which has advantages in image feature representation and similarity calculation, is used to retrieve the most similar images to a given query image. Then, the feature pyramid is employed to extract features at different scales of these images, and the features at the same scale are treated as nodes of the graph neural network to construct a single-layer graph neural network structure. Secondly, a top–down connection is used to cascade the single-layer graph structures, where the information of nodes in the previous graph is fused into a message node to improve the accuracy of camera pose estimation. To better capture the topological relationships and spatial geometric constraints between images, an attention mechanism is introduced in the single-layer graph structure, which helps to effectively propagate information to the next graph during the cascading process, thereby enhancing the robustness of camera relocation. Experimental results on the public dataset 7-Scenes demonstrate that the proposed method can effectively improve the accuracy of camera absolute pose localization, with average translation and rotation errors of 0.19 m and 6.9°, respectively. Compared to other deep learning-based methods, the proposed method achieves more than 10% improvement in both average translation and rotation accuracy, demonstrating highly competitive localization precision. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Gradient Guided Co-Retention Feature Pyramid Network for LDCT Image Denoising

Author: Zhou, Li, Wang, Dayang, Xu, Yongshun, Han, Shuo, Morovati, Bahareh, Fan, Shuyi, Yu, Hengyong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Linguraru, Marius George, editor, Dou, Qi, editor, Feragen, Aasa, editor, Giannarou, Stamatia, editor, Glocker, Ben, editor, Lekadir, Karim, editor, and Schnabel, Julia A., editor
Published: 2024
Full Text: View/download PDF

27. YOLO-BS: A Better Object Detection Model for Real-Time Driver Behavior Detection

Author: Xi, Yang, Guo, Jinxin, Ma, Ming, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Pan, Yijie, editor, and Guo, Jiayang, editor
Published: 2024
Full Text: View/download PDF

28. Handheld Knife Stick Detection Based on Dual-Path Multi-layer Residuals

Author: Jin, Liuhui, Lu, Quanli, Sui, Chenchen, Chen, Jiyang, Yi, Changle, Jiang, Jiaxuan, Shi, Yanhua, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Si, Zhanjun, editor, and Guo, Jiayang, editor
Published: 2024
Full Text: View/download PDF

29. Multi-layer Cross-Scale Coupling Feature Pyramid Network for Food Logo Detection

Author: Zhang, Baisong, Hou, Sujuan, Zhao, Songhui, Hou, Qiang, Li, Xiaojie, Yan, Wuxia, Tsihrintzis, George A., Series Editor, Virvou, Maria, Series Editor, Jain, Lakhmi C., Series Editor, Su, Jianbo, editor, and Qiao, Xiuquan, editor
Published: 2024
Full Text: View/download PDF

30. Dfp-Unet: A Biomedical Image Segmentation Method Based on Deformable Convolution and Feature Pyramid

Author: Yang, Zengzhi, Wei, Yubin, Yu, Xiao, Guan, Jinting, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Yang, De-Nian, editor, Xie, Xing, editor, Tseng, Vincent S., editor, Pei, Jian, editor, Huang, Jen-Wei, editor, and Lin, Jerry Chun-Wei, editor
Published: 2024
Full Text: View/download PDF

31. Improving Pedestrian Attribute Recognition with Dense Feature Pyramid and Mixed Pooling

Author: Xiao, He, Zou, Chen, Chen, Yaosheng, Gong, Sujia, Dong, Siwen, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Wu, Celimuge, editor, Chen, Xianfu, editor, Feng, Jie, editor, and Wu, Zhen, editor
Published: 2024
Full Text: View/download PDF

32. MsF-HigherHRNet: Multi-scale Feature Fusion for Human Pose Estimation in Crowded Scenes

Author: Yu, Cuihong, Han, Cheng, Zhang, Qi, Zhang, Chao, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hu, Shi-Min, editor, Cai, Yiyu, editor, and Rosin, Paul, editor
Published: 2024
Full Text: View/download PDF

33. Multi-scale Context Aggregation for Video-Based Person Re-Identification

Author: Wu, Lei, Zhang, Canlong, Li, Zhixin, Hu, Liaojie, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
Published: 2024
Full Text: View/download PDF

34. A Lightweight Method for Small Object Detection Models on Unmanned Aerial Vehicles Based on L-FPN

Author: Wei Haokun, Liu Jingyi, Chen Jinyong, Chu Boce, Sun Yuxin, Zhu Jin
Subjects: object detection, feature pyramid, model lightweight, remote sensing images, uav, Motor vehicles. Aeronautics. Astronautics, TL1-4050
Abstract: Oriented object detection in remote sensing images is a current research hotspot. Due to the varying heights and equipment used in capturing remote sensing images, the ground sampling distance (GSD) of each image also varies, causing many small objects to be easily overlooked. Existing rotated object detection algorithms are mainly aimed at multi-scale object detection in general scenarios. The feature pyramid network (FPN) has complex and time-consuming fusion computations, which still faces great challenges when deployed on edge devices like UAVs. Therefore, this paper proposes a lightweight method for small object detection in UAVs based on L-FPN. First, normalize the scale according to the GSD information of the image. Second, remove redundant high-level feature maps in the FPN. Finally, adjust the anchor box sizes for small object detection. The method is trained and validated on the DOTA dataset.Results show that compared to the traditional models, the proposed L-FPN-based lightweight method for small object detection in UAVs achieves consistent recognition accuracy, with 2.7% fewer model parameters, 28% smaller model size, and 13.24% faster inference speed.
Published: 2024
Full Text: View/download PDF

35. A Serial Multi-Scale Feature Fusion and Enhancement Network for Amur Tiger Re-Identification.

Author: Xu, Nuo, Ma, Zhibin, Xia, Yi, Dong, Yanqi, Zi, Jiali, Xu, Delong, Xu, Fu, Su, Xiaohui, Zhang, Haiyan, and Chen, Feixiang
Abstract: Simple Summary: The Amur tiger is an endangered species in the world, and effective statistics on its individuals and population through re-identification will contribute to ecological diversity investigation and assessment. Due to the fact that the fur texture features of the Amur tiger contain genetic information, the main method of identifying Amur tigers is to distinguish their fur and facial features. In summary, this paper proposes a serial multi-scale feature fusion and enhancement network for Amur tiger re-identification, and designs a global inverted pyramid multi-scale feature fusion module and a local dual-domain attention feature enhancement module. We aim to enhance the learning of fine-grained features and differences in fur texture by better fusing and enhancing global and local features. Our proposed network and module have achieved excellent results on the public dataset of the ATRW. The Amur tiger is an important endangered species in the world, and its re-identification (re-ID) plays an important role in regional biodiversity assessment and wildlife resource statistics. This paper focuses on the task of Amur tiger re-ID based on visible light images from screenshots of surveillance videos or camera traps, aiming to solve the problem of low accuracy caused by camera perspective, noisy background noise, changes in motion posture, and deformation of Amur tiger body patterns during the re-ID process. To overcome this challenge, we propose a serial multi-scale feature fusion and enhancement re-ID network of Amur tiger for this task, in which global and local branches are constructed. Specifically, we design a global inverted pyramid multi-scale feature fusion method in the global branch to effectively fuse multi-scale global features and achieve high-level, fine-grained, and deep semantic feature preservation. We also design a local dual-domain attention feature enhancement method in the local branch, further enhancing local feature extraction and fusion by dividing local feature blocks. Based on the above model structure, we evaluated the effectiveness and feasibility of the model on the public dataset of the Amur Tiger Re-identification in the Wild (ATRW), and achieved good results on mAP, Rank-1, and Rank-5, demonstrating a certain competitiveness. In addition, since our proposed model does not require the introduction of additional expensive annotation information and does not incorporate other pre-training modules, it has important advantages such as strong transferability and simple training. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition.

Author: He, Li, Wang, Qingxiang, Liu, Jie, Duan, Jianyong, and Wang, Hao
Subjects: COLLABORATIVE learning
Abstract: The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe three key issues. Firstly, models are prone to misguidance when fusing unrelated text and images. Secondly, most existing visual features are not enhanced or filtered. Finally, due to the independent encoding strategies employed for text and images, a noticeable semantic gap exists between them. To address these challenges, we propose a framework called visual clue guidance and consistency matching (GMF). To tackle the first issue, we introduce a visual clue guidance (VCG) module designed to hierarchically extract visual information from multiple scales. This information is utilized as an injectable visual clue guidance sequence to steer text representations for error-insensitive prediction decisions. Furthermore, by incorporating a cross-scale attention (CSA) module, we successfully mitigate interference across scales, enhancing the image's capability to capture details. To address the third issue of semantic disparity between text and images, we employ a consistency matching (CM) module based on the idea of multimodal contrastive learning, facilitating the collaborative learning of multimodal data. To validate the effectiveness of our proposed framework, we conducted comprehensive experimental studies, including extensive comparative experiments, ablation studies, and case studies, on two widely used benchmark datasets, demonstrating the efficacy of the framework. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Joint condition monitoring framework of wind turbines based on multi-task learning with poor-quality data.

Author: Ding, Jiawen, Deng, Lei, Li, Qikang, Gu, Xinyu, and Tang, Baoping
Subjects: INFORMATION sharing, PYRAMIDS
Abstract: Effective condition monitoring can improve the reliability of the turbine and reduce its downtime. However, due to the complexity of the operating conditions, the monitoring data is always mixed with poor-quality data. Poor-quality data mixed in monitoring tasks disrupts long-term dependency on data, which challenges traditional condition monitoring methods to work. To solve it, a joint reparameterization feature pyramid network (JRFPN) is proposed. Firstly, three different reparameterization tricks are designed to reform temporal information and exchange cross-temporal information, to alleviate the damage of long-term dependency. Secondly, a joint condition monitoring framework is designed, aiming to suppress feature confounding between poor-quality data and faulty data. The auxiliary task is trained to extract the degradation trend. The main task fights against feature confounding and dynamically delineates the failure threshold. The degradation trend and failure threshold decisions are corrected for each other to make the final joint state inference. Besides, considering the different quality of the monitoring variables, a channel weighting mechanism is designed to strengthen the ability of JRFPN. The measured data proved that JRFPN is more effective than other methods. • A dynamic channel attention unit(DCAU) to weigh the contribution differences of monitoring variables. • Adaptive data repair by Pixel-level(Re-Param block), scale-level(RepDCConv), and field-level(modified FPS) reparameterization tricks to adaptively adjust the parameter to alleviate the damage of long-term dependency patterns by poor-quality data. • A main and auxiliary adversarial correction-training mode of the network is designed to dynamically delineate the failure threshold and make the joint state inference. • A joint condition monitoring framework to maintain very high accuracy and very low FNR and FPR in the presence of large amounts of poor-quality data. Besides, The degradation trend of the device could be observed through PH. The results of the model are interpretable. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. 金字塔渐进融合低照度图像增强网络.

Author: 余映, 徐超越, 李淼, 何鹏浩, and 杨昊
Abstract: Copyright of Journal of National University of Defense Technology / Guofang Keji Daxue Xuebao is the property of NUDT Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

39. SFPN: segmentation-based feature pyramid network for multi-focus image fusion.

Author: Wu, Pan, Jiang, Limai, Li, Ying, Fan, Hui, and Li, Jinjiang
Subjects: IMAGE fusion, PYRAMIDS, FEATURE extraction, INFORMATION resources, DEEP learning
Abstract: In multi-focus image fusion, different targets often have different sizes, and the network with poor multi-scale feature extraction ability will inevitably lead to the omission of the source image information. Inspired by this, we propose a network that uses the double multi-scale feature pyramid to extract multi-scale features. We design an effective channel compression excitation module and a channel spatial attention module, which form the semantic segmentation mechanism. The mechanism can efficiently extract multi-scale feature maps, maximize the global information of the source image and ignore similar information. We introduce a joint loss function and use post-processing operations to generate smooth decision maps and fused images. The proposed SFPN is compared with the seven existing MFIf methods in terms of six objective quantitative metrics and subjective visual effects and achieves superior performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. A lightweight vehicle mounted multi-scale traffic sign detector using attention fusion pyramid.

Author: Wang, Junfan, Chen, Yi, Gu, Yeting, Yan, Yunfeng, Li, Qi, Gao, Mingyu, and Dong, Zhekang
Subjects: *VEHICLE detectors, *TRAFFIC signs & signals, *TRAFFIC monitoring, *INTELLIGENT transportation systems, *PYRAMIDS
Abstract: Intelligent Transportation System (ITS) aims to strengthen the connection between vehicles, roads, and people. As the important road information in ITS, intelligent detection of traffic signs has become an important part in the intelligent vehicle. In this paper, a lightweight vehicle mounted multi-scale traffic sign detector is proposed. First, guided by the attention fusion algorithm, an improved feature pyramid network is proposed, named AFPN. Assign weights according to the importance of information and fuse multi-dimensional attention maps to improve feature extraction and information retention capabilities. Second, a multi-head detection structure is designed to improve the positioning and detection capability of the detector. According to the target scale, the corresponding detection head is constructed to improve the target detection accuracy. The experimental results show that compared with other state-of-the-art methods, the proposed method not only has excellent detection accuracy with 50.3% for small targets and 64.8% for large targets but also can better trade-off detection speed and detection accuracy. Furthermore, the proposed detector is deployed on the Jetson Xavier NX and integrated with the vehicle-mounted camera, inverter, and LCD to realize real-time traffic sign detection on the vehicle terminal, and the speed reaches 25.6 FPS. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Vehicle object counting network based on feature pyramid split attention mechanism.

Author: Liu, Mingsheng, Wang, Yu, Yi, Hu, and Huang, Xiaohui
Subjects: *OBJECT recognition (Computer vision), *PYRAMIDS, *TRAFFIC congestion, *COUNTING, *PEDESTRIANS, *AUTOMOBILE license plates
Abstract: In recent years, real-time vehicle congestion detection has become a hot research topic in the field of transportation due to the frequent occurrence of highway traffic jams. Vehicle congestion detection generally adopts a vehicle counting algorithm based on object detection, but it is not effective in scenarios with large changes in vehicle scale, dense vehicles, background clutter, and severe occlusion. A vehicle object counting network based on a feature pyramid split attention mechanism is proposed for accurate vehicle counting and the generation of high-quality vehicle density maps in highly congested scenarios. The network extracts rich contextual features by using blocks at different scales, and then obtains a multi-scale feature mapping in the channel direction using kernel convolution of different sizes, and uses the channel attention module at different scales separately to allow the network to focus on features at different scales to obtain an attention vector in the channel direction to reduce mis-estimation of background information. Experiments on the vehicle datasets TRANCOS, CARPK, and HS-Vehicle show that the proposed method outperforms most existing counting methods based on detection or density estimation. The relative improvement in MAE metrics is 90.5% for the CARPK dataset compared to Fast R-CNN and 73.0% for the HS-Vehicle dataset compared to CSRNet. In addition, the method is also extended to count other objects, such as pedestrians in the ShanghaiTech dataset, and the proposed method effectively reduces the misrecognition rate and achieves higher counting performance compared to the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. An N-Shaped Lightweight Network with a Feature Pyramid and Hybrid Attention for Brain Tumor Segmentation.

Author: Chi, Mengxian, An, Hong, Jin, Xu, and Nie, Zhenguo
Subjects: *BRAIN tumors, *PYRAMIDS, *NOMOGRAPHY (Mathematics), *CLINICAL medicine
Abstract: Brain tumor segmentation using neural networks presents challenges in accurately capturing diverse tumor shapes and sizes while maintaining real-time performance. Additionally, addressing class imbalance is crucial for achieving accurate clinical results. To tackle these issues, this study proposes a novel N-shaped lightweight network that combines multiple feature pyramid paths and U-Net architectures. Furthermore, we ingeniously integrate hybrid attention mechanisms into various locations of depth-wise separable convolution module to improve efficiency, with channel attention found to be the most effective for skip connections in the proposed network. Moreover, we introduce a combination loss function that incorporates a newly designed weighted cross-entropy loss and dice loss to effectively tackle the issue of class imbalance. Extensive experiments are conducted on four publicly available datasets, i.e., UCSF-PDGM, BraTS 2021, BraTS 2019, and MSD Task 01 to evaluate the performance of different methods. The results demonstrate that the proposed network achieves superior segmentation accuracy compared to state-of-the-art methods. The proposed network not only improves the overall segmentation performance but also provides a favorable computational efficiency, making it a promising approach for clinical applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. 基于L-FPN的无人机上小目标识别模型轻量化方法.

Author: 魏昊坤, 刘敬一, 陈金勇, 楚博策, 孙裕鑫, and 朱进
Abstract: Copyright of Aero Weaponry is the property of Aero Weaponry Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

44. Feature pyramid-based convolutional neural network image inpainting.

Author: Wang, Shengbo and Wang, Xiuyou
Abstract: Deep learning-based methods are widely used in the field of image processing and have achieved remarkable results. However, these methods often produce mis-filling phenomenon when dealing with irregular broken images. The main reason is that the underlying information of the feature map is not fully utilized, and the semantic information of feature maps at different scales cannot complement each other effectively. Therefore, we propose a network structure based on feature pyramid. In the first stage, we set the expansion factor used to avoid the grid effect and increase the receptive field, while maximizing the use of the underlying feature map information. The second stage uses a feature fusion branch, which first samples the feature maps to construct the feature pyramid, second fuses feature maps with different resolutions and semantic strengths, and finally, generates an image by back-convolution of the feature maps with a decoder. Our experimental results show that this method generates recovered regions with coherent, clear, and visually reasonable images, superior to other methods in terms of image quality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Counting Method Based on Density Graph Regression and Object Detection

Author: GAO Jie, ZHAO Xinxin, YU Jian, XU Tianyi, PAN Li, YANG Jun, YU Mei, LI Xuewei
Subjects: intensive count, target detection, deep learning, density map regression, feature pyramid, Electronic computers. Computer science, QA75.5-76.95
Abstract: In response to the low recall rate of detection-based methods and the problem of missing target location information in density-based methods, which are the two mainstream dense-counting methods, a detection and counting method based on density map regression is proposed by combining the two tasks, achieving the counting and positioning of target objects in dense scenes. Complementing the advantages of two methods not only improves recall rate but also calibrates all targets. To extract richer feature information to deal with complex data scenarios, a feature pyramid optimization module is proposed, which vertically fuses low-level high-resolution features with top-level abstract semantic features and horizontally fuses same-size features to enrich the semantic expression of target objects. To address the issue of low pixel proportions occupied by target objects in dense counting scenarios, an attention mechanism for small targets is proposed to improve the network’s detection sensitivity, which can enhance the attention of the network to target objects by constructing a mask on the input image. Experimental results demonstrate that the proposed method significantly improves recall rate and accurately locates targets while maintaining accuracy, effectively providing counting and positioning information of input image, which has a wide range of application prospects in various fields such as industry and ecology.
Published: 2024
Full Text: View/download PDF

46. Integrated Neural Network-Based Pupil Tracking Technology for Wearable Gaze Tracking Devices in Flight Training

Author: Heming Zhang and Changyuan Wang
Subjects: Pupil-tracking, hybrid neural network, feature pyramid, ViT, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Pupil tracking technology is a tracking and detection method that uses eye image information to extract real-time position information of the pupil. Detecting the pilot’s eye movement patterns and characteristics through pupil movement signals is an important part of monitoring the pilot’s physiological characteristics. The current pupil tracking algorithm is prone to problems such as insufficient tracking accuracy and discontinuous pupil signals when faced with problems such as pupil occlusion caused by frequent blinking and loss of pupil information in dark light environments that occur during flight training for pilot students. To increase the tracking accuracy of pilots’ pupils, this paper designs an integrated neural network-based pupil tracking technology for wearable gaze tracking devices in flight training. To solve the above problems, this paper builds a pupil positioning model based on the hybrid neural network by combining the feature pyramid and ViT network. On this basis, we built a hybrid neural network pupil tracking model for occluded pupil images based on the pilot eye data characteristics collected during flight training and designed a new loss function suitable for pupil detection. After verification, the pupil tracking algorithm we proposed has significantly improved the visual tracking accuracy with an error range of less than 5 pixels compared with existing methods, and the tracking accuracy can reach up to 85%. In pilot flight training, this algorithm has better pupil tracking stability, can effectively reduce pupil signal interference caused by pupil occlusion, and can achieve more accurate real-time tracking of pupils.
Published: 2024
Full Text: View/download PDF

47. DFP-Net: A Crack Segmentation Method Based on a Feature Pyramid Network.

Author: Li, Linjing, Liu, Ran, Ali, Rashid, Chen, Bo, Lin, Haitao, Li, Yonglong, and Zhang, Hua
Subjects: PYRAMIDS, FEATURE extraction, IMAGE segmentation
Abstract: Timely detection of defects is essential for ensuring safe and stable operation of concrete buildings. Automatic segmentation of concrete buildings' surfaces is challenging due to the high diversity of crack appearance, the detailed information, and the unbalanced proportion of crack pixels and background pixels. In this work, the Double Feature Pyramid Network is designed for high-precision crack segmentation. Our work reached the state-of-the-art level in crack segmentation, with key contributions outlined as follows: firstly, considering the diversity of crack shapes, the network constructs a feature pyramid containing three feature extraction backbones to extract the global feature map with three scale input images. In particular, due to the biggest challenge being too much single-pixel crack area, the targeted feature pyramid based on the high-resolution is added to extract adequate shallow semantic information. Lastly, designing a cascade feature fusion unit to aggregate the extracted multi-dimensional feature maps and obtain the final prediction. Compared with existing crack detection methods, the superior performance of this method has been verified based on extensive experiments, with Pixel Accuracy of 65.99%, Intersection over Union of 44.71%, and Recall of 62.95%, providing a reliable and efficient solution for the health monitoring and maintenance of concrete structures. This work contributes to the advancement of research and practical applications in related fields, offering robust support for the monitoring and maintenance of concrete structures. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Smart grid line fault detection based on deep learning image recognition algorithm.

Author: Huang, Jianfeng and Wan, Qiang
Abstract: Amid the rapid evolution of smart grids, stringent demands for reliability, safety, and efficiency have escalated for transmission lines. Nevertheless, given their extensive coverage and the complexity of their environments, traditional inspection techniques struggle to meet the rigorous standards of real-time monitoring and precision. This paper introduces a novel fault detection approach for smart grid transmission lines, leveraging advanced deep-learning image recognition algorithms. This method improves the YOLOv5 series models by combining the Convolution Attention Module (CBAM), Bidirectional Feature Pyramid Network (BiFPN), and MobileNet-V3 as the feature extraction network of YOLOv5 to achieve defect detection of hardware on transmission lines. CBAM improves the model's sensitivity to tiny defects by focusing on key areas and features. BiFPN improves detection accuracy and robustness through efficient fusion of multi-scale features. Furthermore, the integration of MobileNet-V3 enhances both the efficiency and precision of feature extraction, ultimately elevating the overall model's performance. Experimental results show that compared with the original YOLOv5 model, the improved algorithm can achieve 92.89% accuracy and 84.10% recall, which is better than other mainstream target detection algorithms. Especially in the detection of defects in small targets and complex backgrounds, the improved model shows stronger robustness and adaptability. This provides strong technical support for the efficient operation and maintenance of smart grids. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Pedestrian Detection Based on Feature Enhancement in Complex Scenes.

Author: Su, Jiao, An, Yi, Wu, Jialin, and Zhang, Kai
Subjects: *PEDESTRIANS, *TRANSPORTATION security measures, *COMPUTER vision, *KNOWLEDGE transfer, *PROBLEM solving
Abstract: Pedestrian detection has always been a difficult and hot spot in computer vision research. At the same time, pedestrian detection technology plays an important role in many applications, such as intelligent transportation and security monitoring. In complex scenes, pedestrian detection often faces some challenges, such as low detection accuracy and misdetection due to small target sizes and scale variations. To solve these problems, this paper proposes a pedestrian detection network PT-YOLO based on the YOLOv5. The pedestrian detection network PT-YOLO consists of the YOLOv5 network, the squeeze-and-excitation module (SE), the weighted bi-directional feature pyramid module (BiFPN), the coordinate convolution (coordconv) module and the wise intersection over union loss function (WIoU). The SE module in the backbone allows it to focus on the important features of pedestrians and improves accuracy. The weighted BiFPN module enhances the fusion of multi-scale pedestrian features and information transfer, which can improve fusion efficiency. The prediction head design uses the WIoU loss function to reduce the regression error. The coordconv module allows the network to better perceive the location information in the feature map. The experimental results show that the pedestrian detection network PT-YOLO is more accurate compared with other target detection methods in pedestrian detection and can effectively accomplish the task of pedestrian detection in complex scenes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Traffic Sign Detection Algorithm Based on Improved YOLOv8s.

Author: Xiaoming Zhang and Ying Tian
Subjects: *TRAFFIC monitoring, *TRAFFIC signs & signals, *ALGORITHMS, *NETWORK performance, *LEARNING modules, *HUMAN fingerprints, *COORDINATES
Abstract: Aiming at the problems of low accuracy, false detection, missed detection, and low real-time detection of current traffic sign detection, this paper proposes an improved traffic sign detection algorithm based on the YOLOv8s algorithm. Firstly, this paper proposes a double-layer semicomposite backbone network structure (DSCB), which uses the auxiliary backbone network to extract features, and then transmits the extracted features to the backbone network to enhance the ability of the backbone network to extract target features. At the same time, the deformable convolution is integrated into the DC2f structure of the auxiliary backbone network to enhance the generalization performance of the network. Secondly, the coordinate attention mechanism is used after the SPPF layer. The coordinate attention mechanism can better retain the coordinate position information of small targets, reduce the miss rate of the model, and increase detection accuracy. Finally, this paper introduces a new CAB module to learn and aggregate the output of each layer of the feature pyramid for global spatial context to enhance the feature representation ability further. The experimental results show that the improved algorithm achieves 90.51% detection accuracy, 82.00% recall rate, 89.51% mAP@0.5 on the TT100K dataset, and the FPS reaches 106. Compared with the original algorithm model, the detection accuracy is increased by 2.27%, and the recall rate is increased by 2.48%. mAP@0.5 is increased by 2.01%, and FPS is increased by 1. The improved traffic sign detection algorithm meets the requirements in detection accuracy and real-time detection. [ABSTRACT FROM AUTHOR]
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

418 results on '"feature pyramid"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources