Back to Search Start Over

An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos

Authors :
Reddy, Arun
Shah, Ketul
Rivera, Corban
Paul, William
De Melo, Celso M.
Chellappa, Rama
Source :
Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II. Vol. 13035. SPIE, 2024
Publication Year :
2024

Abstract

In this work, we explore the possibility of using synthetically generated data for video-based gesture recognition with large pre-trained models. We consider whether these models have sufficiently robust and expressive representation spaces to enable "training-free" classification. Specifically, we utilize various state-of-the-art video encoders to extract features for use in k-nearest neighbors classification, where the training data points are derived from synthetic videos only. We compare these results with another training-free approach -- zero-shot classification using text descriptions of each gesture. In our experiments with the RoCoG-v2 dataset, we find that using synthetic training videos yields significantly lower classification accuracy on real test videos compared to using a relatively small number of real training videos. We also observe that video backbones that were fine-tuned on classification tasks serve as superior feature extractors, and that the choice of fine-tuning data has a substantial impact on k-nearest neighbors performance. Lastly, we find that zero-shot text-based classification performs poorly on the gesture recognition task, as gestures are not easily described through natural language.<br />Comment: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II (SPIE Defense + Commercial Sensing, 2024)

Details

Database :
arXiv
Journal :
Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II. Vol. 13035. SPIE, 2024
Publication Type :
Report
Accession number :
edsarx.2410.02152
Document Type :
Working Paper
Full Text :
https://doi.org/10.1117/12.3013530