Back to Search Start Over

Lip-reading with Densely Connected Temporal Convolutional Networks

Authors :
Ma, Pingchuan
Wang, Yujiang
Shen, Jie
Petridis, Stavros
Pantic, Maja
Source :
2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2856-2865, 2021
Publication Year :
2020

Abstract

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words. Although Temporal Convolutional Networks (TCN) have recently demonstrated great potential in many vision tasks, its receptive fields are not dense enough to model the complex temporal dynamics in lip-reading scenarios. To address this problem, we introduce dense connections into the network to capture more robust temporal features. Moreover, our approach utilises the Squeeze-and-Excitation block, a light-weight attention mechanism, to further enhance the model's classification power. Without bells and whistles, our DC-TCN method has achieved 88.36% accuracy on the Lip Reading in the Wild (LRW) dataset and 43.65% on the LRW-1000 dataset, which has surpassed all the baseline methods and is the new state-of-the-art on both datasets.<br />Comment: WACV 2021. An improved code implementation is available at: https://github.com/mpc001/Lipreading_using_Temporal_Convolutional_Networks

Details

Database :
arXiv
Journal :
2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2856-2865, 2021
Publication Type :
Report
Accession number :
edsarx.2009.14233
Document Type :
Working Paper
Full Text :
https://doi.org/10.1109/WACV48630.2021.00290