Back to Search Start Over

DeepActsNet: A deep ensemble framework combining features from face, hands, and body for action recognition.

Authors :
Asif, Umar
Mehta, Deval
Von Cavallar, Stefan
Tang, Jianbin
Harrer, Stefan
Source :
Pattern Recognition. Jul2023, Vol. 139, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

• We present "Deep Action Stamps (DeepActs)", a novel data representation which encodes actions in terms of spatial and motion information extracted from face, hands, and body. To the best of our knowledge, this is the first work which models the spatial and temporal dependencies between facial joints, hand joints, and body joints for action recognition. • We present DeepActsNet, an ensemble of Enhanced Convolutional Graph Networks (ECGN) that learn convolutional and structural features from different feature channels of Deep Actions Stamps. We also develop a lightweight strong baseline, which is more powerful than the previous methods in terms of recognition accuracy and computational efficiency. • We present ablation studies in terms of the benefits of combining spatial and motion information from face, hands, and body, and the significance of ensembling convolutional and structural features for improving accuracy of challenging action classes. Experiments on three public datasets show that our contributions consistently exceed the state-of-the-art performance on all datasets with considerable margins. Human action recognition from videos has gained substantial focus due to its wide applications in the field of video understanding. Most of the existing approaches extract human skeleton data from videos to encode actions because of the invariance nature of the skeleton information with respect to lightning conditions and background changes. Despite their success in achieving high recognition accuracy, methods based on limited body joints fail to capture the nuances of subtle body parts which are highly relevant for discriminating similar actions. In this paper, we overcome this limitation by presenting a holistic framework for combining spatial and motion features from the body, face, and hands to develop a novel data representation termed "Deep Actions Stamps (DeepActs)" for video-based action recognition. Compared to the skeleton sequences based on limited body joints, DeepActs encode more effective spatio-temporal features that provide robustness against pose estimation noises and improve action recognition accuracy. We also present "DeepActsNet", a deep learning based ensemble model which learns convolutional and structural features from Deep Action Stamps for highly accurate action recognition. Experiments on three challenging action recognition datasets (NTU60, NTU120, and SYSU) show that the proposed model produces significant improvements in the action recognition accuracy with less computational cost compared to the state-of-the-art methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00313203
Volume :
139
Database :
Academic Search Index
Journal :
Pattern Recognition
Publication Type :
Academic Journal
Accession number :
162848520
Full Text :
https://doi.org/10.1016/j.patcog.2023.109484