80 results on '"Detection transformer"'
Search Results
2. Automated dimensional quality inspection of super-large steel mesh using fixed-spacing detection transformer and improved oriented fast and rotated brief
- Author
-
Guo, Xinfei, Huang, Yimiao, Zhang, Shaopeng, and Ma, Guowei
- Published
- 2025
- Full Text
- View/download PDF
3. Enhancing Object Detection Accuracy with Hybrid Supervision and Trans-Stage Interaction
- Author
-
Wang, Wenlong, Hua, Pinyan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hadfi, Rafik, editor, Anthony, Patricia, editor, Sharma, Alok, editor, Ito, Takayuki, editor, and Bai, Quan, editor
- Published
- 2025
- Full Text
- View/download PDF
4. BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
- Author
-
Lee, Pilhyeon, Byun, Hyeran, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
- Author
-
Yu, Qinji, Wang, Yirui, Yan, Ke, Li, Haoshen, Guo, Dazhou, Zhang, Li, Shen, Na, Wang, Qifeng, Ding, Xiaowei, Lu, Le, Ye, Xianghua, Jin, Dakai, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Enhanced multi-scale trademark element detection using the improved DETR
- Author
-
Longwen Li, Xiuhui Wang, and Wei Qi Yan
- Subjects
Detection transformer ,Attention mechanism ,Multi-scale feature fusion ,Trademark retrieval ,Medicine ,Science - Abstract
Abstract The exponential growth in the number of registered trademarks, coupled with the escalating incidents of trademark infringement, has made the automatic detection of such infractions a crucial area of study in the domain of market regulation. In light of the diverse range of elements and the pervasive presence of small targets in trademark images, we present an enhanced version of the DETR-based Multi-Scale Trademark Element Detection Network (MSTED-Net). Our primary innovation lies in incorporating a dual fusion mechanism that integrates the Spatial Attention Module (SAM) and Global Context Network (GCNet) within the backbone network, thereby providing a more robust approach to capture the essential characteristics of the trademark images under investigation. Subsequently, we develop a Multi-scale Feature Augmentation Pyramid (MFA-FPN), which aims to further fortify the model’s ability to extract features and boost the detection efficiency for small targets. The efficacy of our proposed detection network is demonstrated through experimental results, showcasing an outstanding detection accuracy of 91.12% in comparison to other state-of-the-art detection algorithms.
- Published
- 2024
- Full Text
- View/download PDF
7. A Novel Detection Transformer Framework for Ship Detection in Synthetic Aperture Radar Imagery Using Advanced Feature Fusion and Polarimetric Techniques.
- Author
-
Ahmed, Mahmoud, El-Sheimy, Naser, and Leung, Henry
- Subjects
- *
CONVOLUTIONAL neural networks , *DECOMPOSITION method , *FEATURE extraction , *SIGNAL-to-noise ratio , *SYNTHETIC aperture radar , *RADAR , *SYNTHETIC apertures - Abstract
Ship detection in synthetic aperture radar (SAR) imagery faces significant challenges due to the limitations of traditional methods, such as convolutional neural network (CNN) and anchor-based matching approaches, which struggle with accurately detecting smaller targets as well as adapting to varying environmental conditions. These methods, relying on either intensity values or single-target characteristics, often fail to enhance the signal-to-clutter ratio (SCR) and are prone to false detections due to environmental factors. To address these issues, a novel framework is introduced that leverages the detection transformer (DETR) model along with advanced feature fusion techniques to enhance ship detection. This feature enhancement DETR (FEDETR) module manages clutter and improves feature extraction through preprocessing techniques such as filtering, denoising, and applying maximum and median pooling with various kernel sizes. Furthermore, it combines metrics like the line spread function (LSF), peak signal-to-noise ratio (PSNR), and F1 score to predict optimal pooling configurations and thus enhance edge sharpness, image fidelity, and detection accuracy. Complementing this, the weighted feature fusion (WFF) module integrates polarimetric SAR (PolSAR) methods such as Pauli decomposition, coherence matrix analysis, and feature volume and helix scattering (Fvh) components decomposition, along with FEDETR attention maps, to provide detailed radar scattering insights that enhance ship response characterization. Finally, by integrating wave polarization properties, the ability to distinguish and characterize targets is augmented, thereby improving SCR and facilitating the detection of weakly scattered targets in SAR imagery. Overall, this new framework significantly boosts DETR's performance, offering a robust solution for maritime surveillance and security. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Casting-DETR: An End-to-End Network for Casting Surface Defect Detection.
- Author
-
Pu, Quan-cheng, Zhang, Hui, Xu, Xiang-rong, Zhang, Long, Gao, Ju, Rodić, Aleksandar, Petrovic, Petar B., Wang, Hai-yan, Xu, Shan-shan, and Wang, Zhi-xiong
- Subjects
- *
SURFACE defects , *COMPUTER vision , *DEEP learning , *TRANSFORMER models - Abstract
The task of utilizing machine vision for the detection of casting surface defects is characterized by small targets, real-time performance, and ease of mobility. The direct application of current mainstream object detection networks for defect detection presents issues of low accuracy and efficiency. Consequently, in this paper, we introduce Casting-DETR, an end-to-end network designed for casting surface defect detection. To assess and validate the model's performance, 554 images of casting samples with surface defects were employed. Casting-DETR achieved an impressive detection rate of 98.97% on the test set, with a single image detection time of 91.5ms. Furthermore, a real-time detection system, built using PyQT6, was tested in four different environments. Casting-DETR exhibited exceptional performance, maintaining a single-frame detection time of approximately 90 ms, demonstrating the model's high robustness and suitability for real-time detection. The Casting-DETR network proposed in this paper is an end-to-end solution with rapid convergence, superior detection accuracy, and swift detection speeds, offering a fresh perspective for similar detection tasks within the industry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Motion Prediction and Object Detection for Image-Based Visual Servoing Systems Using Deep Learning.
- Author
-
Hao, Zhongwen, Zhang, Deli, and Honarvar Shakibaei Asli, Barmak
- Subjects
PARTICLE swarm optimization ,OBJECT recognition (Computer vision) ,TIME series analysis ,ROBOTICS ,ALGORITHMS - Abstract
This study primarily investigates advanced object detection and time series prediction methods in image-based visual servoing systems, aiming to capture targets better and predict the motion trajectory of robotic arms in advance, thereby enhancing the system's performance and reliability. The research first implements object detection on the VOC2007 dataset using the Detection Transformer (DETR) and achieves ideal detection scores. The particle swarm optimization algorithm and 3-5-3 polynomial interpolation methods were utilized for trajectory planning, creating a unique dataset through simulation. This dataset contains randomly generated trajectories within the workspace, fully simulating actual working conditions. Significantly, the Bidirectional Long Short-Term Memory (BILSTM) model was improved by substituting its traditional Multilayer Perceptron (MLP) components with Kolmogorov–Arnold Networks (KANs). KANs, inspired by the K-A theorem, improve the network representation ability by placing learnable activation functions on fixed node activation functions. By implementing KANs, the model enhances parameter efficiency and interpretability, thus addressing the typical challenges of MLPs, such as the high parameter count and lack of transparency. The experiments achieved favorable predictive results, indicating that the KAN not only reduces the complexity of the model but also improves learning efficiency and prediction accuracy in dynamic visual servoing environments. Finally, Gazebo software was used in ROS to model and simulate the robotic arm, verify the effectiveness of the algorithm, and achieve visual servoing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. A Step-Wise Domain Adaptation Detection Transformer for Object Detection under Poor Visibility Conditions.
- Author
-
Zhang, Gege, Wang, Luping, and Chen, Zengping
- Subjects
- *
TRANSFORMER models , *KNOWLEDGE transfer , *WEATHER , *LIGHTING - Abstract
To address the performance degradation of cross-domain object detection under various illumination conditions and adverse weather scenarios, this paper introduces a novel method a called Step-wise Domain Adaptation DEtection TRansformer (SDA-DETR). Our approach decomposes the adaptation process into three sequential steps, progressively transferring knowledge from a labeled dataset to an unlabeled one using the DETR (DEtection TRansformer) architecture. Each step precisely reduces domain discrepancy, thereby facilitating effective transfer learning. In the initial step, a target-like domain is constructed as an auxiliary to the source domain to reduce the domain gap at the image level. Then, we adaptively align the source domain and target domain features at both global and local levels. To further mitigate model bias towards the source domain, we develop a token-masked autoencoder (t-MAE) to enhance target domain features at the semantic level. Comprehensive experiments demonstrate that the SDA-DETR outperforms several popular cross-domain object detection methods on three challenging public driving datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. HiSVision: A Method for Detecting Large-Scale Structural Variations Based on Hi-C Data and Detection Transformer
- Author
-
Zhai, Haixia, Dong, Chengyao, Wang, Tao, and Luo, Junwei
- Published
- 2024
- Full Text
- View/download PDF
12. A method for analyzing handwritten program flowchart based on detection transformer and logic rules
- Author
-
Wang, Huiyong, Gao, Shan, and Zhang, Xiaoming
- Published
- 2024
- Full Text
- View/download PDF
13. WH-DETR: An Efficient Network Architecture for Wheat Spike Detection in Complex Backgrounds.
- Author
-
Yang, Zhenlin, Yang, Wanhong, Yi, Jizheng, and Liu, Rong
- Subjects
WHEAT ,DATA augmentation ,TRANSFORMER models ,PRECISION farming ,PYRAMIDS - Abstract
Wheat spike detection is crucial for estimating wheat yields and has a significant impact on the modernization of wheat cultivation and the advancement of precision agriculture. This study explores the application of the DETR (Detection Transformer) architecture in wheat spike detection, introducing a new perspective to this task. We propose a high-precision end-to-end network named WH-DETR, which is based on an enhanced RT-DETR architecture. Initially, we employ data augmentation techniques such as image rotation, scaling, and random occlusion on the GWHD2021 dataset to improve the model's generalization across various scenarios. A lightweight feature pyramid, GS-BiFPN, is implemented in the network's neck section to effectively extract the multi-scale features of wheat spikes in complex environments, such as those with occlusions, overlaps, and extreme lighting conditions. Additionally, the introduction of GSConv enhances the network precision while reducing the computational costs, thereby controlling the detection speed. Furthermore, the EIoU metric is integrated into the loss function, refined to better focus on partially occluded or overlapping spikes. The testing results on the dataset demonstrate that this method achieves an Average Precision (AP) of 95.7%, surpassing current state-of-the-art object detection methods in both precision and speed. These findings confirm that our approach more closely meets the practical requirements for wheat spike detection compared to existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. A Detection Transformer-Based Intelligent Identification Method for Multiple Types of Road Traffic Safety Facilities.
- Author
-
Lu, Lingxin, Wang, Hui, Wan, Yan, and Xu, Feifei
- Subjects
- *
CONVOLUTIONAL neural networks , *ROAD safety measures , *TRAFFIC safety - Abstract
Road traffic safety facilities (TSFs) are of significant importance in the management and maintenance of traffic safety. The complexity and variety of TSFs make it challenging to detect them manually, which renders the work unsustainable. To achieve the objective of automatic TSF detection, a target detection dataset, designated TSF-CQU (TSF data collected by Chongqing University), was constructed based on images collected by a car recorder. This dataset comprises six types of TSFs and 8410 instance samples. A detection transformer with an improved denoising anchor box (DINO) was selected to construct a model that would be suitable for this scenario. For comparison purposes, Faster R-CNN (Region Convolutional Neural Network) and Yolov7 (You Only Look Once version 7) were employed. The DINO model demonstrated the highest performance on the TSF-CQU dataset, with a mean average precision (mAP) of 82.2%. All of the average precision (AP) values exceeded 0.8, except for streetlights (AP = 0.77) and rods (AP = 0.648). The DINO model exhibits minimal instances of erroneous recognition, which substantiates the efficacy of the contrastive denoising training approach. The DINO model rarely makes misjudgments, but a few missed detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Slice-Consistent Lymph Nodes Detection Transformer in CT Scans via Cross-Slice Query Contrastive Learning
- Author
-
Yu, Qinji, Wang, Yirui, Yan, Ke, Lu, Le, Shen, Na, Ye, Xianghua, Ding, Xiaowei, Jin, Dakai, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Linguraru, Marius George, editor, Dou, Qi, editor, Feragen, Aasa, editor, Giannarou, Stamatia, editor, Glocker, Ben, editor, Lekadir, Karim, editor, and Schnabel, Julia A., editor
- Published
- 2024
- Full Text
- View/download PDF
16. A Hybrid Approach for Document Layout Analysis in Document Images
- Author
-
Shehzadi, Tahira, Stricker, Didier, Afzal, Muhammad Zeshan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Barney Smith, Elisa H., editor, Liwicki, Marcus, editor, and Peng, Liangrui, editor
- Published
- 2024
- Full Text
- View/download PDF
17. UI-DETR: GUI Component Detection from the System Screen with Transformers
- Author
-
Kato, Sotaro, Shinozawa, Yoshihisa, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Stephanidis, Constantine, editor, Antona, Margherita, editor, Ntoa, Stavroula, editor, and Salvendy, Gavriel, editor
- Published
- 2024
- Full Text
- View/download PDF
18. A Fully End-to-End Query-Based Detector with Transformers for Multiscale Ship Detection in SAR Images
- Author
-
Lin, Hai, Liu, Jin, Li, Xingye, Yu, Zijun, Wu, Zhongdai, Wang, Junxiang, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, You, Peng, editor, Liu, Shuaiqi, editor, and Wang, Jun, editor
- Published
- 2024
- Full Text
- View/download PDF
19. SText-DETR: End-to-End Arbitrary-Shaped Text Detection with Scalable Query in Transformer
- Author
-
Liao, Pujin, Wang, Zengfu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
20. Feature fusion means a lot to DETRs
- Author
-
Huakai, Xu
- Published
- 2024
- Full Text
- View/download PDF
21. End-to-end semi-supervised approach with modulated object queries for table detection in documents
- Author
-
Ehsan, Iqraa, Shehzadi, Tahira, Stricker, Didier, and Afzal, Muhammad Zeshan
- Published
- 2024
- Full Text
- View/download PDF
22. Enhanced multi-scale trademark element detection using the improved DETR
- Author
-
Li, Longwen, Wang, Xiuhui, and Yan, Wei Qi
- Published
- 2024
- Full Text
- View/download PDF
23. IDPD: improved deformable-DETR for crowd pedestrian detection.
- Author
-
Han, Wenjing, He, Ning, Wang, Xin, Sun, Fengxi, and Liu, Shengjie
- Abstract
Pedestrian detection is an important basis for many pedestrian-related applications and studies, and has received extensive attention in recent years. The end-to-end DEtection TRansformer (DETR) is a method that avoids the manual design of components and achieves better results than convolutional neural networks in general object detection. Inspired by this, we present the Improved Deformable-DETR for crowd Pedestrian Detection (IDPD). First, we propose a dynamic neck, specifically, one that uses omni-dimensional dynamic convolution to change the number of channels in the neck feature maps, to alleviate the problem of pedestrian information loss caused by the reduction in the number of channels in the feature maps. Second, we design a hybrid decoding loss that incorporates one-to-one Hungarian matching loss, one-to-many Hungarian matching auxiliary loss, and reconstruction loss for reconstructing full-body boxes from noisy visible part boxes based on contrastive denoising method, to tackle the slow convergence issue in Deformable-DETR for crowd pedestrian detection caused by the more serious positive and negative sample imbalance and unstable bipartite map matching problems. IDPD was experimentally evaluated on the CrowdHuman validation dataset. When using ResNet-50 as the backbone network, it obtains the results of 93.22% AP, 39.22% MR - 2 , and 85.02% JI, outperforming the Deformable-DETR baseline and surpassing CNN-based models. Furthermore, even better results are obtained (94.16% AP, 37.05% MR - 2 , and 86.07% JI) when using Swin-T as the backbone network. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. DETR-SPP: a fine-tuned vehicle detection with transformer.
- Author
-
S P, Krishnendhu and Mohandas, Prabu
- Abstract
Real-time vehicle detection is the most challenging and crucial task in intelligent transportation systems. Speed and accuracy are the most anticipated qualities for a vehicle detection model. The existing real-time vehicle detection models lack either one of these qualities, i.e., higher accuracy is achieved at the expense of speed and vice versa. This makes them unfit for real-time deployment, where both speed and accuracy are equally important. Also, occlusion is an inevitable factor that makes detection more complex and affects the system's accuracy. Furthermore, there is no dedicated model for vehicle detection. This study proposes a better one-stage vehicle detection network, DETR-SPP, based on bipartite matching and a transformer encoder-decoder architecture. The feature extraction network, the Convolutional Neural Network (CNN), of the DEtection TRansformer (DETR) object detection model is modified to increase the real-time detection speed and accuracy. The spatial pyramid pooling concept is added to remove the fixed-size constraint and increase the learning capacity of the network. The network is trained only with vehicle classes from the MS COCO 2017 dataset, such as bus, car, motorcycle, and truck. When compared with the other state-of-the-art models, DETR-SPP gives higher accuracy in real-time vehicle detection. On the MS COCO 2017 dataset, the proposed model achieves a better mAP of 51.31%, which is 5.19% higher as compared to the DETR baseline model. Moreover, the proposed DETR-SPP attained a p value of 0.03 while performing the Wilcoxon signed-rank test. Thus, the proposed DETR-SPP is a better model for vehicle detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. CCDN-DETR: A Detection Transformer Based on Constrained Contrast Denoising for Multi-Class Synthetic Aperture Radar Object Detection.
- Author
-
Lei Zhang, Jiachun Zheng, Chaopeng Li, Zhiping Xu, Jiawen Yang, Qiuxin Wei, and Xinyi Wu
- Abstract
The effectiveness of the SAR object detection technique based on Convolutional Neural Networks (CNNs) has been widely proven, and it is increasingly used in the recognition of ship targets. Recently, efforts have been made to integrate transformer structures into SAR detectors to achieve improved target localization. However, existing methods rarely design the transformer itself as a detector, failing to fully leverage the long-range modeling advantages of self-attention. Furthermore, there has been limited research into multi-class SAR target detection. To address these limitations, this study proposes a SAR detector named CCDN-DETR, which builds upon the framework of the detection transformer (DETR). To adapt to the multiscale characteristics of SAR data, cross-scale encoders were introduced to facilitate comprehensive information modeling and fusion across different scales. Simultaneously, we optimized the query selection scheme for the input decoder layers, employing IOU loss to assist in initializing object queries more effectively. Additionally, we introduced constrained contrastive denoising training at the decoder layers to enhance the model’s convergence speed and improve the detection of different categories of SAR targets. In the benchmark evaluation on a joint dataset composed of SSDD, HRSID, and SAR-AIRcraft datasets, CCDN-DETR achieves a mean Average Precision (mAP) of 91.9%. Furthermore, it demonstrates significant competitiveness with 83.7% mAP on the multi-class MSAR dataset compared to CNN-based models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Recognition of bird nests on transmission lines based on YOLOv5 and DETR using small samples
- Author
-
Yanli Yang and Xinlin Wang
- Subjects
Energy Transportation ,Transmission line inspection ,Bird nest recognition ,YOLOv5 ,Detection transformer ,Deep learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The safety and reliability of transmission line operations are critical to the delivery of electricity. Bird nests are one of the frequent potential factors affecting the safety of transmission lines. The main objective of this paper is to propose a small-sample learning method to detect bird nests on transmission lines. The method combines the two latest deep learning models, YOLOv5 and detection transformer (DETR). Inspired by biological vision, this method transfers the learning of bird nests in daily scenes to the recognition of bird nests on transmission lines. The proposed method is evaluated by two public datasets. The test on the first one presents a recognition rate of 95.50%, whereas the training set only contains ten homologous data and 80 non-homologous data. The second test shows that 85.54% of the samples are recognized by generalization without homologous training data. The results show that our method provides a way to identify bird nests on transmission lines with the help of bird nests of daily scenes under small sample conditions.
- Published
- 2023
- Full Text
- View/download PDF
27. A Novel Detection Transformer Framework for Ship Detection in Synthetic Aperture Radar Imagery Using Advanced Feature Fusion and Polarimetric Techniques
- Author
-
Mahmoud Ahmed, Naser El-Sheimy, and Henry Leung
- Subjects
object detection ,detection transformer ,feature fusion ,polarimetric SAR ,Science - Abstract
Ship detection in synthetic aperture radar (SAR) imagery faces significant challenges due to the limitations of traditional methods, such as convolutional neural network (CNN) and anchor-based matching approaches, which struggle with accurately detecting smaller targets as well as adapting to varying environmental conditions. These methods, relying on either intensity values or single-target characteristics, often fail to enhance the signal-to-clutter ratio (SCR) and are prone to false detections due to environmental factors. To address these issues, a novel framework is introduced that leverages the detection transformer (DETR) model along with advanced feature fusion techniques to enhance ship detection. This feature enhancement DETR (FEDETR) module manages clutter and improves feature extraction through preprocessing techniques such as filtering, denoising, and applying maximum and median pooling with various kernel sizes. Furthermore, it combines metrics like the line spread function (LSF), peak signal-to-noise ratio (PSNR), and F1 score to predict optimal pooling configurations and thus enhance edge sharpness, image fidelity, and detection accuracy. Complementing this, the weighted feature fusion (WFF) module integrates polarimetric SAR (PolSAR) methods such as Pauli decomposition, coherence matrix analysis, and feature volume and helix scattering (Fvh) components decomposition, along with FEDETR attention maps, to provide detailed radar scattering insights that enhance ship response characterization. Finally, by integrating wave polarization properties, the ability to distinguish and characterize targets is augmented, thereby improving SCR and facilitating the detection of weakly scattered targets in SAR imagery. Overall, this new framework significantly boosts DETR’s performance, offering a robust solution for maritime surveillance and security.
- Published
- 2024
- Full Text
- View/download PDF
28. Enhancing Object Detection in Remote Sensing: A Hybrid YOLOv7 and Transformer Approach with Automatic Model Selection.
- Author
-
Ahmed, Mahmoud, El-Sheimy, Naser, Leung, Henry, and Moussa, Adel
- Subjects
- *
OBJECT recognition (Computer vision) , *ZONING , *IMAGE recognition (Computer vision) , *LOCALIZATION (Mathematics) , *REMOTE sensing - Abstract
In the remote sensing field, object detection holds immense value for applications such as land use classification, disaster monitoring, and infrastructure planning, where accurate and efficient identification of objects within images is essential for informed decision making. However, achieving object localization with high precision can be challenging even if minor errors exist at the pixel level, which can significantly impact the ground distance measurements. To address this critical challenge, our research introduces an innovative hybrid approach that combines the capabilities of the You Only Look Once version 7 (YOLOv7) and DEtection TRansformer (DETR) algorithms. By bridging the gap between local receptive field and global context, our approach not only enhances overall object detection accuracy, but also promotes precise object localization, a key requirement in the field of remote sensing. Furthermore, a key advantage of our approach is the introduction of an automatic selection module which serves as an intelligent decision-making component. This module optimizes the selection process between YOLOv7 and DETR, and further improves object detection accuracy. Finally, we validate the improved performance of our new hybrid approach through empirical experimentation, and thus confirm its contribution to the field of target recognition and detection in remote sensing images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. AParC-DETR: Accelerate DETR training by introducing Adaptive Position-aware Circular Convolution
- Author
-
Guan, Ya’nan, Liao, Shujiao, and Yang, Wenyuan
- Published
- 2024
- Full Text
- View/download PDF
30. Is the Encoder Necessary in DETR-Type Models?-Analysis of Encoder Redundancy
- Author
-
Huan, Liu, Sen, Lin, Zhi, Han, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Yang, Huayong, editor, Liu, Honghai, editor, Zou, Jun, editor, Yin, Zhouping, editor, Liu, Lianqing, editor, Yang, Geng, editor, Ouyang, Xiaoping, editor, and Wang, Zhiyong, editor
- Published
- 2023
- Full Text
- View/download PDF
31. DTDT: Highly Accurate Dense Text Line Detection in Historical Documents via Dynamic Transformer
- Author
-
Li, Haiyang, Liu, Chongyu, Wang, Jiapeng, Huang, Mingxin, Zhou, Weiying, Jin, Lianwen, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Fink, Gernot A., editor, Jain, Rajiv, editor, Kise, Koichi, editor, and Zanibbi, Richard, editor
- Published
- 2023
- Full Text
- View/download PDF
32. Fourier Feature-based CBAM and Vision Transformer for Text Detection in Drone Images
- Author
-
Roy, Ayush, Shivakumara, Palaiahnakote, Pal, Umapada, Mokayed, Hamam, Liwicki, Marcus, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Coustaty, Mickael, editor, and Fornés, Alicia, editor
- Published
- 2023
- Full Text
- View/download PDF
33. Use the Detection Transformer as a Data Augmenter
- Author
-
Wang, Luping, Liu, Bin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Premaratne, Prashan, editor, Jin, Baohua, editor, Qu, Boyang, editor, Jo, Kang-Hyun, editor, and Hussain, Abir, editor
- Published
- 2023
- Full Text
- View/download PDF
34. A Step-Wise Domain Adaptation Detection Transformer for Object Detection under Poor Visibility Conditions
- Author
-
Gege Zhang, Luping Wang, and Zengping Chen
- Subjects
domain adaptation ,detection transformer ,target-like domain ,masked autoencoder ,Science - Abstract
To address the performance degradation of cross-domain object detection under various illumination conditions and adverse weather scenarios, this paper introduces a novel method a called Step-wise Domain Adaptation DEtection TRansformer (SDA-DETR). Our approach decomposes the adaptation process into three sequential steps, progressively transferring knowledge from a labeled dataset to an unlabeled one using the DETR (DEtection TRansformer) architecture. Each step precisely reduces domain discrepancy, thereby facilitating effective transfer learning. In the initial step, a target-like domain is constructed as an auxiliary to the source domain to reduce the domain gap at the image level. Then, we adaptively align the source domain and target domain features at both global and local levels. To further mitigate model bias towards the source domain, we develop a token-masked autoencoder (t-MAE) to enhance target domain features at the semantic level. Comprehensive experiments demonstrate that the SDA-DETR outperforms several popular cross-domain object detection methods on three challenging public driving datasets.
- Published
- 2024
- Full Text
- View/download PDF
35. WH-DETR: An Efficient Network Architecture for Wheat Spike Detection in Complex Backgrounds
- Author
-
Zhenlin Yang, Wanhong Yang, Jizheng Yi, and Rong Liu
- Subjects
deep learning ,detection transformer ,feature pyramid ,wheat spike detection ,agriculture ,Agriculture (General) ,S1-972 - Abstract
Wheat spike detection is crucial for estimating wheat yields and has a significant impact on the modernization of wheat cultivation and the advancement of precision agriculture. This study explores the application of the DETR (Detection Transformer) architecture in wheat spike detection, introducing a new perspective to this task. We propose a high-precision end-to-end network named WH-DETR, which is based on an enhanced RT-DETR architecture. Initially, we employ data augmentation techniques such as image rotation, scaling, and random occlusion on the GWHD2021 dataset to improve the model’s generalization across various scenarios. A lightweight feature pyramid, GS-BiFPN, is implemented in the network’s neck section to effectively extract the multi-scale features of wheat spikes in complex environments, such as those with occlusions, overlaps, and extreme lighting conditions. Additionally, the introduction of GSConv enhances the network precision while reducing the computational costs, thereby controlling the detection speed. Furthermore, the EIoU metric is integrated into the loss function, refined to better focus on partially occluded or overlapping spikes. The testing results on the dataset demonstrate that this method achieves an Average Precision (AP) of 95.7%, surpassing current state-of-the-art object detection methods in both precision and speed. These findings confirm that our approach more closely meets the practical requirements for wheat spike detection compared to existing methods.
- Published
- 2024
- Full Text
- View/download PDF
36. A Detection Transformer-Based Intelligent Identification Method for Multiple Types of Road Traffic Safety Facilities
- Author
-
Lingxin Lu, Hui Wang, Yan Wan, and Feifei Xu
- Subjects
road traffic safety facility ,intelligent identification ,detection transformer ,DINO ,Yolov7 ,Chemical technology ,TP1-1185 - Abstract
Road traffic safety facilities (TSFs) are of significant importance in the management and maintenance of traffic safety. The complexity and variety of TSFs make it challenging to detect them manually, which renders the work unsustainable. To achieve the objective of automatic TSF detection, a target detection dataset, designated TSF-CQU (TSF data collected by Chongqing University), was constructed based on images collected by a car recorder. This dataset comprises six types of TSFs and 8410 instance samples. A detection transformer with an improved denoising anchor box (DINO) was selected to construct a model that would be suitable for this scenario. For comparison purposes, Faster R-CNN (Region Convolutional Neural Network) and Yolov7 (You Only Look Once version 7) were employed. The DINO model demonstrated the highest performance on the TSF-CQU dataset, with a mean average precision (mAP) of 82.2%. All of the average precision (AP) values exceeded 0.8, except for streetlights (AP = 0.77) and rods (AP = 0.648). The DINO model exhibits minimal instances of erroneous recognition, which substantiates the efficacy of the contrastive denoising training approach. The DINO model rarely makes misjudgments, but a few missed detection.
- Published
- 2024
- Full Text
- View/download PDF
37. Sign language recognition from digital videos using feature pyramid network with detection transformer.
- Author
-
Liu, Yu, Nand, Parma, Hossain, Md Akbar, Nguyen, Minh, and Yan, Wei Qi
- Subjects
DEEP learning ,TRANSFORMER models ,SIGN language ,DIGITAL video ,CONVOLUTIONAL neural networks ,OBJECT recognition (Computer vision) ,COMPUTER vision - Abstract
Sign language recognition is one of the fundamental ways to assist deaf people to communicate with others. An accurate vision-based sign language recognition system using deep learning is a fundamental goal for many researchers. Deep convolutional neural networks have been extensively considered in the last few years, and a slew of architectures have been proposed. Recently, Vision Transformer and other Transformers have shown apparent advantages in object recognition compared to traditional computer vision models such as Faster R-CNN, YOLO, SSD, and other deep learning models. In this paper, we propose a Vision Transformer-based sign language recognition method called DETR (Detection Transformer), aiming to improve the current state-of-the-art sign language recognition accuracy. The DETR method proposed in this paper is able to recognize sign language from digital videos with a high accuracy using a new deep learning model ResNet152 + FPN (i.e., Feature Pyramid Network), which is based on Detection Transformer. Our experiments show that the method has excellent potential for improving sign language recognition accuracy. For instance, our newly proposed net ResNet152 + FPN is able to enhance the detection accuracy up to 1.70% on the test dataset of sign language compared to the standard Detection Transformer models. Besides, an overall accuracy 96.45% was attained by using the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. Towards Data-Efficient Detection Transformers
- Author
-
Wang, Wen, Zhang, Jing, Cao, Yang, Shen, Yongliang, Tao, Dacheng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
- Published
- 2022
- Full Text
- View/download PDF
39. Towards Hard-Positive Query Mining for DETR-Based Human-Object Interaction Detection
- Author
-
Zhong, Xubin, Ding, Changxing, Li, Zijian, Huang, Shaoli, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
- Published
- 2022
- Full Text
- View/download PDF
40. Conditional Context-Aware Feature Alignment for Domain Adaptive Detection Transformer
- Author
-
Chen, Siyuan, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Þór Jónsson, Björn, editor, Gurrin, Cathal, editor, Tran, Minh-Triet, editor, Dang-Nguyen, Duc-Tien, editor, Hu, Anita Min-Chun, editor, Huynh Thi Thanh, Binh, editor, and Huet, Benoit, editor
- Published
- 2022
- Full Text
- View/download PDF
41. Machine learning based augmented reality for improved learning application through object detection algorithms.
- Author
-
Hanafi, Anasse, Elaachak, Lotfi, and Bouhorma, Mohammed
- Subjects
OBJECT recognition (Computer vision) ,COMPUTER vision ,AUGMENTED reality ,MACHINE learning ,BASES (Architecture) ,CONVOLUTIONAL neural networks - Abstract
Detection of objects and their location in an image are important elements of current research in computer vision. In May 2020, Meta released its state-of-the-art object-detection model based on a transformer architecture called detection transformer (DETR). There are several object-detection models such as region-based convolutional neural network (R-CNN), you only look once (YOLO) and single shot detectors (SSD), but none have used a transformer to accomplish this task. These models mentioned earlier, use all sorts of hyperparameters and layers. However, the advantages of using a transformer pattern make the architecture simple and easy to implement. In this paper, we determine the name of a chemical experiment through two steps: firstly, by building a DETR model, trained on a customized dataset, and then integrate it into an augmented reality mobile application. By detecting the objects used during the realization of an experiment, we can predict the name of the experiment using a multi-class classification approach. The combination of various computer vision techniques with augmented reality is indeed promising and offers a better user experience. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. TSD-DETR: A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving.
- Author
-
Zhang, Lili, Yang, Kang, Han, Yucheng, Li, Jing, Wei, Wei, Tan, Hongxin, Yu, Pei, Zhang, Ke, and Yang, Xudong
- Subjects
- *
TRAFFIC monitoring , *FEATURE extraction , *AUTONOMOUS vehicles , *MULTISCALE modeling , *DECISION making , *TRAFFIC signs & signals - Abstract
The key to accurate perception and efficient decision making of autonomous driving is the long-range detection of traffic signs. Long-range detection of traffic signs has the problems of small traffic sign size and complex background. In order to solve these problems, this paper proposes a lightweight model for traffic sign detection based on real-time detection transformer (TSD-DETR). Firstly, the feature extraction module is constructed using multiple types of convolutional modules. The model extracts multi-scale features of different levels to enhance feature extraction ability. Then, small object detection module and detection head are designed to extract and detect shallow features. It can improve the detection of small traffic signs. Finally, Efficient Multi-Scale Attention is introduced to adjust the channel weights. It aggregates the output features of three parallel branches interactively. TSD-DETR achieves a mean average precision (mAp) of 96.8% on Tsinghua-Tencent 100K dataset. It is improved by 2.5% compared with real-time detection transformer. In small object detection, mAp improved by 9%. TSD-DETR achieves 99.4% mAp on the Changsha University of Science and Technology Chinese Traffic Sign Detection Benchmark dataset, with an improvement of 0.6%. The experimental results show that TSD-DETR reduces the number of parameters by 9.06M by optimizing the model structure. On the premise of ensuring the real-time performance of the model, the detection accuracy of the model is improved greatly. The results of ablation experiments show that the feature extraction module and small object detection module proposed in this paper are conducive to improving the detection accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
43. Enhancing Object Detection in Remote Sensing: A Hybrid YOLOv7 and Transformer Approach with Automatic Model Selection
- Author
-
Mahmoud Ahmed, Naser El-Sheimy, Henry Leung, and Adel Moussa
- Subjects
object detection ,detection transformer ,YOLOv7 ,multimodalities ,Science - Abstract
In the remote sensing field, object detection holds immense value for applications such as land use classification, disaster monitoring, and infrastructure planning, where accurate and efficient identification of objects within images is essential for informed decision making. However, achieving object localization with high precision can be challenging even if minor errors exist at the pixel level, which can significantly impact the ground distance measurements. To address this critical challenge, our research introduces an innovative hybrid approach that combines the capabilities of the You Only Look Once version 7 (YOLOv7) and DEtection TRansformer (DETR) algorithms. By bridging the gap between local receptive field and global context, our approach not only enhances overall object detection accuracy, but also promotes precise object localization, a key requirement in the field of remote sensing. Furthermore, a key advantage of our approach is the introduction of an automatic selection module which serves as an intelligent decision-making component. This module optimizes the selection process between YOLOv7 and DETR, and further improves object detection accuracy. Finally, we validate the improved performance of our new hybrid approach through empirical experimentation, and thus confirm its contribution to the field of target recognition and detection in remote sensing images.
- Published
- 2023
- Full Text
- View/download PDF
44. Vehicle Density Prediction in Low Quality Videos with Transformer Timeseries Prediction Model (TTPM).
- Author
-
Suvitha, D. and Vijayalakshmi, M.
- Subjects
SURVEILLANCE detection ,TRAFFIC congestion ,OPTICAL character recognition ,ERROR rates ,DATA encryption - Abstract
Recent advancement in low-cost cameras has facilitated surveillance in various developing towns in India. The video obtained from such surveillance are of low quality. Still counting vehicles from such videos are necessity to avoid traffic congestion and allows drivers to plan their routes more precisely. On the other hand, detecting vehicles from such low quality videos are highly challenging with vision based methodologies. In this research a meticulous attempt is made to access low-quality videos to describe traffic in Salem town in India, which is mostly an un-attempted entity by most available sources. In this work profound Detection Transformer (DETR) model is used for object (vehicle) detection. Here vehicles are anticipated in a rush-hour traffic video using a set of loss functions that carry out bipartite coordinating among estimated and information acquired on real attributes. Every frame in the traffic footage has its date and time which is detected and retrieved using Tesseract Optical Character Recognition. The date and time extricated and perceived from the input image are incorporated with the length of the recognized objects acquired from the DETR model. This furnishes the vehicles report with timestamp. Transformer Timeseries Prediction Model (TTPM) is proposed to predict the density of the vehicle for future prediction, here the regular NLP layers have been removed and the encoding temporal layer has been modified. The proposed TTPM error rate outperforms the existing models with RMSE of 4.313 and MAE of 3.812. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes.
- Author
-
Geng, Huantong, Jiang, Jun, Shen, Junye, and Hou, Mengmeng
- Subjects
- *
DEEP-sea corals , *IMAGE denoising - Abstract
Transformer-based object detection has recently attracted increasing interest and shown promising results. As one of the DETR-like models, DETR with improved denoising anchor boxes (DINO) produced superior performance on COCO val2017 and achieved a new state of the art. However, it often encounters challenges when applied to new scenarios where no annotated data is available, and the imaging conditions differ significantly. To alleviate this problem of domain shift, in this paper, unsupervised domain adaptive DINO via cascading alignment (CA-DINO) was proposed, which consists of attention-enhanced double discriminators (AEDD) and weak-restraints on category-level token (WROT). Specifically, AEDD is used to aggregate and align the local–global context from the feature representations of both domains while reducing the domain discrepancy before entering the transformer encoder and decoder. WROT extends Deep CORAL loss to adapt class tokens after embedding, minimizing the difference in second-order statistics between the source and target domain. Our approach is trained end to end, and experiments on two challenging benchmarks demonstrate the effectiveness of our method, which yields 41% relative improvement compared to baseline on the benchmark dataset Foggy Cityscapes, in particular. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. DS-DETR: A Model for Tomato Leaf Disease Segmentation and Damage Evaluation.
- Author
-
Wu, Jianshuang, Wen, Changji, Chen, Hongrui, Ma, Zhenyu, Zhang, Tian, Su, Hengqiang, and Yang, Ce
- Subjects
- *
LEAF spots , *PLANT diseases , *PLANT classification , *NOSOLOGY , *PROBLEM solving , *BLIGHT diseases (Botany) - Abstract
Early blight and late blight are important factors restricting tomato yield. However, it is still a challenge to accurately and objectively detect and segment crop diseases in order to evaluate disease damage. In this paper, the Disease Segmentation Detection Transformer (DS-DETR) is proposed to segment leaf disease spots efficiently based on several improvements to DETR. Additionally, a damage assessment is carried out by the area ratio of the segmented leaves to the disease spots. First, an unsupervised pre-training method was introduced into DETR with the Plant Disease Classification Dataset (PDCD) to solve the problem of the long training epochs and slow convergence speed of DETR. This method can train the Transformer structures in advance to obtain leaf disease features. Loading the pre-training model weight in DS-DETR can speed up the convergence speed of the model. Then, Spatially Modulated Co-Attention (SMCA) was used to assign Gaussian-like spatial weights to the query box of DS-DETR. The different positions in the image are trained using the query boxes with different weights to improve the accuracy of the model. Finally, an improved relative position code was added to the Transformer structure of DS-DETR. Relative position coding promotes the capture of the sequence order of input tokens by the Transformer. The spatial location feature is strengthened by establishing the location relationship between different instances. Based on these improvements, the DS-DETR model was tested on the Tomato leaf Disease Segmentation Dataset (TDSD) constructed by us. The experimental results show that the DS-DETR proposed by us achieved 0.6823 for APmask, which improved by 12.87%, 8.25%, 3.67%, 1.95%, 10.27%, and 9.52% compared with the state-of-the-art: Mask RCNN, BlendMask, CondInst, SOLOv2, ISTR, and DETR, respectively. In addition, the disease grading accuracy reached 0.9640 according to the segmentation results given by our proposed model. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. IFS-DETR: A real-time industrial fire smoke detection algorithm based on an end-to-end structured network.
- Author
-
Chen, JiaSheng, Han, HuiZi, Liu, Mei, Su, Peng, and Chen, Xi
- Subjects
- *
DETECTION algorithms , *TRANSFORMER models , *FIRE prevention , *FEATURE extraction , *ECONOMIC equilibrium , *FIRE detectors - Abstract
Fire prevention in industrial settings are paramount for ensuring human safety and economic stability. However, current mainstream DETR detectors face significant challenges in applications due to the necessity for extensive memory accesses and inference delays. To tackle these complexities, we propose an Industrial Fire Smoke Detector based on an end-to-end structured framework. In a series of innovative optimizations, we firstly adopt a more lightweight backbone network, LeanNet, for feature extraction. Combined with the optimized Transformer architecture, this approach enhances the model's detection speed to address real-time challenges effectively. Secondly, we introduce a Feature Fusion Network based on alignment mechanisms to enhance the DETR model's multi-scale object representation capabilities without significantly increasing latency. Subsequently, to facilitate easier training and optimization of IFS-DETR, we introduce IoU-aware query selection and an aspect ratio-based denoising training strategy, and enhance the localization loss function using Inner-SIoU. Finally, we deploy IFS-DETR on NVIDIA Jetson Orin Nano. The dataset is available at https://github.com/Sonnenb1ume/IFS-DETR. • LeanNet, an efficient backbone network based on depth-separable convolution. • AFFNet, a feature fusion network based on alignment mechanism and EFCA attention. • Improve training efficiency with more effective training strategies and box losses. • Deployed IFS-DETR in edge devices for real-time applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
48. Adaptive token selection for efficient detection transformer with dual teacher supervision.
- Author
-
Yuan, Muyao, Zhang, Weizhan, Yan, Caixia, Gong, Tieliang, Zhang, Yuanhong, and Ying, Jiangyong
- Subjects
- *
TEACHER training , *DISTILLATION , *SUPPLY & demand , *TEACHERS , *ELECTIONS - Abstract
Recently DEtection TRansformer(DETR)-based models obtain remarkable performance in object detection and various foundational vision tasks. However, its performance is impeded by high computational demands since it exhibits quadratic scaling with the number of feature tokens. To mitigate redundant computations in some areas like the background, existing works propose static token selection methods, which choose a predefined portion of tokens to forward. However, it is intuitive that the complexity of inference for detection tasks varies depending on the input images. Static token selection methods rely on a fixed keeping ratio, causing performance degradation in complex scenes and inefficiency in simple scenes. To address this issue, we propose an A daptive T oken S election method for DETR (ATS-DETR) that dynamically chooses the token keeping ratio based on the complexity of the input to retain the most salient tokens. To explicitly control the sparsity and improve the performance of ATS-DETR, we put forward a novel approach called Dual Teacher Supervision to train the ATS-DETR. Specifically, we utilize a weak teacher to assist the model in distinguishing input complexity and a strong teacher for enhancing overall model performance through feature distillation. We further introduce the Global Distillation to minish the disparities of the feature patterns extracted from ATS-DETR and the strong teacher model. Extensive experiments demonstrate that ATS-DETR attains better performance compared to Deformable DETR while achieving an 83% reduction of GFLOPs in the encoder, and outperforms all the static token selection methods. • Methodology for adaptive token selection in DETR based on input difficulty. • An input-aware adaptive training strategy with dual teacher supervision. • A distillation approach to minish the disparities in the feature patterns extracted from adaptive and static model. • Adaptive token selection demonstrates superiority over static token selection methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Towards automatic identification of ship-ice contact using acceleration measurement.
- Author
-
Ma, Qun, Cui, Meng, Li, Fang, Zhou, Li, and Ding, Shifeng
- Subjects
- *
ACCELERATION measurements , *AUTOMATIC identification , *ICE fields , *FOURIER transforms , *ICEBREAKERS (Ships) , *ICE navigation , *NAVAL architecture - Abstract
When a ship navigates in ice-covered areas, its hull experiences motion and vibration responses due to ship-ice contact. The magnitude of such responses can be represented well by acceleration, which is easy to measure onboard ice-going ships. The level of acceleration is determined by the environmental and operational conditions through excitation via the ship-ice contact. Conversely, it is possible to infer excitation and environmental conditions using acceleration measurements via appropriate methods. Accelerations associated with such responses can also affect the functionality of the onboard equipment and therefore, should be quantified. Nevertheless, studies on shipborne acceleration measurements in ice fields and their links to prevailing excitations and environmental conditions are rare. This study explores the feasibility of using shipborne acceleration measurements to infer the corresponding excitations (i.e., ship-ice contacts) as a first step towards the inference of ice conditions via acceleration measurements. An automatic identification method for ship-ice contact is proposed that can effectively separate ship-ice contact events from acceleration measurement signals. The acceleration signals are decomposed in the time-frequency map obtained through a short-time Fourier transformation (STFT). An improved detection transformer (DETR) model was used to automatically identify ship-ice contact events and achieved good accuracy in the validation set. This model was then applied to automatically identify ship-ice contacts using acceleration signals under various environmental and operational conditions. The frequency of the identified ship-ice contact reasonably reflects its dependence on prevailing environmental and operational conditions, which confirms the validity of the proposed methodology. • A novel concept is proposed to identify ship-ice contacts from onboard acceleration measurements. • An improved DETR model is applied to the time-frequency map obtained by STFT. • The DETR model is applied to various ice conditions and the ship-ice contact frequencies is analyzed. • Study quantifies ship speed and ice concentration impact on ship-ice contact frequency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. An improved defect recognition framework for casting based on DETR algorithm
- Author
-
Zhang, Long, Yan, Sai-fei, Hong, Jun, Xie, Qian, Zhou, Fei, and Ran, Song-lin
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.