Back to Search
Start Over
An Evaluation of Large Pre-Trained Models for Gesture Recognition using Synthetic Videos
- Source :
- Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II. Vol. 13035. SPIE, 2024
- Publication Year :
- 2024
-
Abstract
- In this work, we explore the possibility of using synthetically generated data for video-based gesture recognition with large pre-trained models. We consider whether these models have sufficiently robust and expressive representation spaces to enable "training-free" classification. Specifically, we utilize various state-of-the-art video encoders to extract features for use in k-nearest neighbors classification, where the training data points are derived from synthetic videos only. We compare these results with another training-free approach -- zero-shot classification using text descriptions of each gesture. In our experiments with the RoCoG-v2 dataset, we find that using synthetic training videos yields significantly lower classification accuracy on real test videos compared to using a relatively small number of real training videos. We also observe that video backbones that were fine-tuned on classification tasks serve as superior feature extractors, and that the choice of fine-tuning data has a substantial impact on k-nearest neighbors performance. Lastly, we find that zero-shot text-based classification performs poorly on the gesture recognition task, as gestures are not easily described through natural language.<br />Comment: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II (SPIE Defense + Commercial Sensing, 2024)
- Subjects :
- Computer Science - Computer Vision and Pattern Recognition
Subjects
Details
- Database :
- arXiv
- Journal :
- Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II. Vol. 13035. SPIE, 2024
- Publication Type :
- Report
- Accession number :
- edsarx.2410.02152
- Document Type :
- Working Paper
- Full Text :
- https://doi.org/10.1117/12.3013530