1. FYEO : A Character Level Model For Lip Reading
- Author
-
Vedant Sandeep Joshi and Ebin Deni Raj
- Subjects
Focus (computing) ,Modality (human–computer interaction) ,Process (engineering) ,Computer science ,business.industry ,Deep learning ,media_common.quotation_subject ,Field (computer science) ,Character (mathematics) ,Human–computer interaction ,Reading (process) ,Conversation ,Artificial intelligence ,business ,media_common - Abstract
The human mind is an amazing piece of creation that can handle multiple modalities of input seamlessly and help to make sense about the surroundings. When it comes to making sense about speech, 2 main features of input are sound and vision (although there are many other components). Since not every mind is alike, some of them have trouble processing the sound aspect of input therefore vision becomes their primary source to process and understand speech. Lip reading is a skill that is used mainly by people suffering from hearing deformities and it involves large amount of language specific knowledge as well as contextual awareness i.e. using all possible visual clues that help to make sense of what the other person is saying and thus allow them to take part in the conversation. Recent breakthroughs in the field of Deep learning have clearly shown promise with models that have the ability to extract complex, intricate and generalizable patterns both in spatial as well as temporal dimension. In this paper we present FYEO (For Your Eyes Only) an end-to-end deep learning based solution that only uses vision as its single modality of input and generates a single word, character by character. The model is a modified version of the LipNet architecture from Deep Mind, to a subset of words curated from the Oxford-BBC Lip Reading in the Wild (LRW) dataset. Also, as a part of novel work FYEO is extended by adding attention mechanism for further improvement of the model’s contextual awareness and observe the model’s focus while making a prediction. The standard FYEO model achieves a length normalised test CER (character-error-rate) of 25.024%.
- Published
- 2021
- Full Text
- View/download PDF