Back to Search Start Over

A two stream convolutional neural network with bi-directional GRU model to classify dynamic hand gesture.

Authors :
Verma, Bindu
Source :
Journal of Visual Communication & Image Representation. Aug2022, Vol. 87, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

Dynamic hand gesture recognition is still an interesting topic for the computer vision community. A set of feature vectors can represent any hand gesture. A Recurrent Neural Network (RNN) can recognize these feature vectors as a hand gesture that analyzes the temporal and contextual information of the gesture sequence. Thus, we proposed a hybrid deep learning framework to recognize dynamic hand gestures. In the Hybrid model GoogleNet is pipelined with a Bidirectional GRU unit to recognize the dynamic hand gesture. Dynamic hand gestures consist of many frames, and features of each frame need to be extracted to get the temporal and dynamic information of the performed gesture. As RNN takes input as a sequence of feature vectors, we extract features from videos using pretrained GoogleNet. As Gated Recurrent Unit is one of the variants of RNN to classify the sequential data, we created a feature vector that corresponds to each video and passed it to the bidirectional GRU (BGRU) network to classify the gestures. We evaluate our model on four publicly available hand gesture datasets. The proposed method performs well and is comparable with the existing methods. For instance, we achieved 98.6% accuracy on Northwestern University Hand Gesture(NWUHG), 99.6% on SKIG, 99.4% on Cambridge Hand Gesture (CHG) datasets respectively. We performed our experiments on DHG14/28 dataset and achieved an accuracy of 97.8% with 14-gesture classes and 92.1% on 28-gesture classes. DHG14/28 dataset contains skeleton and depth data, and our proposed model used depth data and achieved comparable accuracy. • The paper presented a two-stream hybrid model that uses a convolutional neural network to extract the frame-level features and used a BGRU to classify the dynamic hand gesture. • A deeply coupled hybrid pipeline is used to extract spatial and temporal features. • The paper presented a comprehensive review of the BLSTM and BGRU model with different batch sizes. • Experimental results shows that the BGRU model with optical flow and RGB videos performs better than state-of-the-art. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10473203
Volume :
87
Database :
Academic Search Index
Journal :
Journal of Visual Communication & Image Representation
Publication Type :
Academic Journal
Accession number :
158482396
Full Text :
https://doi.org/10.1016/j.jvcir.2022.103554