Back to Search Start Over

PMG-DETR: fast convergence of DETR with position-sensitive multi-scale attention and grouped queries.

Authors :
Cui, Shuming
Deng, Hongwei
Source :
Pattern Analysis & Applications. Jun2024, Vol. 27 Issue 2, p1-14. 14p.
Publication Year :
2024

Abstract

The recently proposed DETR successfully applied the Transformer to object detection and achieved impressive results. However, the learned object queries often explore the entire image to match the corresponding regions, resulting in slow convergence of DETR. Additionally, DETR only uses single-scale features from the final stage of the backbone network, leading to poor performance in small object detection. To address these issues, we propose an effective training strategy for improving the DETR framework, named PMG-DETR. We achieve this by using Position-sensitive Multi-scale attention and Grouped queries. First, to better fuse the multi-scale features, we propose a Position-sensitive Multi-scale attention. By incorporating a spatial sampling strategy into deformable attention, we can further improve the performance of small object detection. Second, we extend the attention mechanism by introducing a novel positional encoding scheme. Finally, we propose a grouping strategy for object queries, where queries are grouped at the decoder side for a more precise inclusion of regions of interest and to accelerate DETR convergence. Extensive experiments on the COCO dataset show that PMG-DETR can achieve better performance compared to DETR, e.g., AP 47.8 % using ResNet50 as backbone trained in 50 epochs. We perform ablation studies on the COCO dataset to validate the effectiveness of the proposed PMG-DETR. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14337541
Volume :
27
Issue :
2
Database :
Academic Search Index
Journal :
Pattern Analysis & Applications
Publication Type :
Academic Journal
Accession number :
177172103
Full Text :
https://doi.org/10.1007/s10044-024-01281-0