Back to Search
Start Over
Deformable patch embedding-based shift module-enhanced transformer for panoramic action recognition.
- Source :
- Visual Computer; Aug2023, Vol. 39 Issue 8, p3247-3257, 11p
- Publication Year :
- 2023
-
Abstract
- 360 ∘ video action recognition is one of the most promising fields with the popularity of omnidirectional cameras. To obtain a more precise action understanding in panoramic scene, in this paper, we propose a deformable patch embedding-based temporal shift module-enhanced vision transformer model (DS-ViT), which aims to simultaneously eliminate the distortion effects caused by equirectangular projection (ERP) and construct temporal relationship among the video sequences. Panoramic action recognition is a practical but challenging domain for the lack of panoramic feature extraction methods. With deformable patch embedding, our scheme can adaptively learn the position offsets between different pixels, which effectively captures the distorted features. The temporal shift module facilitates temporal information exchanging by shifting part of the channels with zero parameters. Thanks to the powerful encoder, DS-ViT can efficiently learn the distorted features from the ERP inputs. Simulation results show that our proposed solution outperforms the state-of-the-art two-stream solution by an action accuracy of 9.29 % and an activity accuracy of 8.18 % , where the recent EgoK360 dataset is employed. [ABSTRACT FROM AUTHOR]
- Subjects :
- FEATURE extraction
RECOGNITION (Psychology)
INFORMATION sharing
Subjects
Details
- Language :
- English
- ISSN :
- 01782789
- Volume :
- 39
- Issue :
- 8
- Database :
- Complementary Index
- Journal :
- Visual Computer
- Publication Type :
- Academic Journal
- Accession number :
- 170026911
- Full Text :
- https://doi.org/10.1007/s00371-023-02959-y