Start Over

Swin transformer-based traffic video text tracking.

Authors :: Yu, Jinyao
Qian, Jiangbo
Xin, Yu
Wang, Chong
Dong, Yihong
Source :: Applied Intelligence; Nov2024, Vol. 54 Issue 21, p10581-10595, 15p
Publication Year :: 2024
Abstract: Intelligent systems, such as driving assistance systems, can assist drivers by providing basic traffic, road blockage and possible route information to enable safe driving. The goal of scene text tracking in driver assistance systems is to locate and track scene text, milestone signs, traffic panels and road signs in real time. Therefore, the accuracy and real-time performance of scene text localization tracking play vital roles in intelligent driving assistance systems. However, traffic video text tracking often has the problems of missed and false detections because of illumination occlusion and similar appearances. In this paper, we propose a new Swin transformer-based traffic video text tracking method, known as STVT, which is composed of a Siamese SwinDC transformer module, a deformable text detection module, and a text matching module. The STVT method employs the Siamese SwinDC transformer module, which performs text detection by considering both temporal and spatial dimensions, mitigating the issue of missed detections caused by occlusion. The text matching module combines the semantic, visual, and geometric features of text instances to effectively differentiate visually similar text instances. Extensive experiments demonstrated that our proposed STVT method outperformed the state-of-the-art methods on various benchmark datasets. On the ICDAR2015 dataset, compared with those of the Free method, the mostly matched (MM) result increased by 32.0% (702 vs. 926), and the mostly lost (ML) result decreased by 33.2% (568 vs. 850). The visualization results demonstrated that the proposed STVT model can accurately detect and track occluded text instances in traffic videos. On the ICDAR2023 dataset, our method achieved a 6.01% improvement in MOTA compared to that of the TransDETR method, demonstrating that our proposed method is effective for small and dense text detection problems. In addition, qualitative and quantitative analyses confirmed the effectiveness and real-time performance of our proposed STVT method. [ABSTRACT FROM AUTHOR]