1. Computer vision-based hybrid efficient convolution for isolated dynamic sign language recognition.
- Author
-
Chowdhury, Prothoma Khan, Oyshe, Kabiratun Ummi, Rahaman, Muhammad Aminur, Debnath, Tanoy, Rahman, Anichur, and Kumar, Neeraj
- Subjects
- *
SIGN language , *BENGALI language , *HYBRID computers (Computer architecture) , *VIDEO excerpts , *SPEECH - Abstract
Isolated dynamic sign language recognition (IDSLR) has the potential to change accessibility and inclusion by enabling speech and/or hearing-impaired people to engage more completely in a variety of spheres of life, including social interactions, work, and more. IDSLR is a challenging task due to considering a sequence of image frame analysis with multiple linguistic features for a single gesture in cluttered backgrounds and an illumination variation environment. We have proposed a Hybrid Efficient Convolution (HEC) model that ensembles EfficientNet-B3 and a few modified layers as an alternative to traditional machine learning techniques with improved performances in cluttered backgrounds with illumination variation environments. The architecture of the HCE integrates pre-trained layers of EfficientNet-B3 loaded with customized weights and a new custom dense layer featuring 256 units, followed by batch normalization, dropout, and the final output layer. To enhance the robustness of the system, we employed the augmentation technique during pre-processing. Then, the system executes channel-wise feature transformation through point-wise convolution that reduces the computational complexity and increases the accuracy. The updated dense layer with 256 units processes the output from the standard EfficientNet-B3, shaping the model into a hybrid form to achieve better performance. We have created our own gesture dataset, called "BdSL_OPA_23_GESTURES," which consists of 6000 video clips of 100 isolated dynamic Bangla Sign Language words, with 60 videos for each word from 20 different people in the cluttered background with illumination variation environments to train and evaluate the performances of the proposed model. We have considered 80% of the total dataset for training purpose, while the remaining 20% is dedicated to testing and validation. In a small number of epochs, our proposed HEC model achieves a superior accuracy of 93.17% on our created "BdSL_OPA_23_GESTURES" dataset. All the information of the proposed model with the dataset has been shared along with the scientific community to provide access publicly at: https://github.com/Prothoma2001/Bangla-Continuous-Sign-Language-Recognition.git. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF