Back to Search Start Over

Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach.

Authors :
Li, Yu
Xue, Feng
Wu, Lin
Xie, Yincen
Li, Shujie
Source :
Multimedia Systems. Feb2024, Vol. 30 Issue 1, p1-10. 10p.
Publication Year :
2024

Abstract

Lipreading refers to translating the lip motion regarding a video speaker into the corresponding texts. Existing lipreading methods typically describe the lip motion using visual appearance variations. However, merely using the lip visual variations is prone to associating with inaccurate texts due to the similar lip shapes for different words. Also, visual features are hard to generalize to unseen speakers, especially when the training data is limited. In this paper, we leverage both lip visual motion and facial landmarks and propose an effective sentence-level end-to-end approach for lipreading. The facial landmarks are introduced to eliminate the irrelevant visual features which are sensitive to specific lip appearance of individual speakers. This enables the model to adapt to different lip shapes of speakers and generalize to unseen speakers. In specific, the proposed framework consists of two branches corresponding to the visual features and facial landmarks. The visual branch extracts high-level visual features from the lip movement, and the landmark branch learns to extract both spatial and temporal patterns described by the landmarks. The feature embeddings from two streams for each frame are fused to form its latent vector which can be decoded into texts. We employ a sequence-to-sequence model to operate the feature embeddings of all frames as input, and decode them to generate the texts. The proposed method is demonstrated to well generalize to unseen speakers on benchmark data sets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09424962
Volume :
30
Issue :
1
Database :
Academic Search Index
Journal :
Multimedia Systems
Publication Type :
Academic Journal
Accession number :
175063496
Full Text :
https://doi.org/10.1007/s00530-023-01226-3