1. TransBoNet: Learning camera localization with Transformer Bottleneck and Attention.
- Author
-
Song, Xiaogang, Li, Hongjuan, Liang, Li, Shi, Weiwei, Xie, Guo, Lu, Xiaofeng, and Hei, Xinhong
- Subjects
- *
TRANSFORMER models , *DEEP learning , *CAMERAS , *FEATURE extraction , *LOCALIZATION (Mathematics) , *SINGLE-degree-of-freedom systems , *AUTONOMOUS vehicles - Abstract
6DoF camera localization is an important component of autonomous driving and navigation. Deep learning has achieved impressive results in localization, but its robustness in dynamic environments has not been adequately addressed. In this paper, we propose a framework based on hybrid attention mechanism which can be generally applied to existing CNN-based pose regressors to improve their robustness in dynamic environments. Specifically, we propose a novel Transformer Bottleneck (TBo) block including convolution, channel attention, and a position-aware self-attention mechanism, which extracts more geometrically robust features by capturing the corresponding long-term dependencies between pixels. Furthermore, we introduce shuffle attention (SA) before the pose regressor, which integrates feature information in both spatial and channel dimensions, forcing the network to learn geometrically robust features, reducing the effects of dynamic objects and illumination conditions to improve camera localization accuracy. We evaluate our method on commonly benchmarked indoor and outdoor datasets and the experimental results show that our proposed method can significantly improve localization performance compared compare favorably to contemporary pose regressors schemes. In addition, extensive ablation evaluations are conducted to prove the effectiveness of our proposed hybrid attention bottleneck block for pose regression networks. • We propose a novel Transformer Bottleneck block with self-attention and channel attention to overcome the limitations of convolution. This coupling allows them to be optimized in a mutually reinforcing manner, significantly improving fine-grained feature extraction for accurate localization. • We propose a novel end-to-end hybrid attention network for single image localization, which improves the accuracy and robustness of camera localization, especially in dynamic scenes. • We conduct extensive experiments in both indoor and outdoor dataset, which shows that our model performs better than the existing competitive methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF