1. Automatic Lip Reading Using Convolution Neural Network and Bidirectional Long Short-term Memory.
- Author
-
Lu, Yuanyao and Yan, Jie
- Subjects
- *
SHORT-term memory , *MATHEMATICAL convolutions , *LIPREADING , *MOVEMENT sequences , *LIPS , *AUTOMATIC speech recognition , *FEATURE extraction - Abstract
Traditional automatic lip-reading systems generally consist of two stages: feature extraction and recognition, while the handcrafted features are empirical and cannot learn the relevance of lip movement sequence sufficiently. Recently, deep learning approaches have attracted increasing attention, especially the significant improvements of convolution neural network (CNN) applied to image classification and long short-term memory (LSTM) used in speech recognition, video processing and text analysis. In this paper, we propose a hybrid neural network architecture, which integrates CNN and bidirectional LSTM (BiLSTM) for lip reading. First, we extract key frames from each isolated video clip and use five key points to locate mouth region. Then, features are extracted from raw mouth images using an eight-layer CNN. The extracted features have the characteristics of stronger robustness and fault-tolerant capability. Finally, we use BiLSTM to capture the correlation of sequential information among frame features in two directions and the softmax function to predict final recognition result. The proposed method is capable of extracting local features through convolution operations and finding hidden correlation in temporal information from lip image sequences. The evaluation results of lip-reading recognition experiments demonstrate that our proposed method outperforms conventional approaches such as active contour model (ACM) and hidden Markov model (HMM). [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF