1. LIVELINET: A Multimodal Deep Recurrent Neural Network to Predict Liveliness in Educational Videos
- Author
-
Sharma, Arjun, Biswas, Arijit, Gandhi, Ankit, Patil, Sonal, and Deshmukh, Om
- Abstract
Online educational videos have emerged as one of the most popular modes of learning in the recent years. Studies have shown that liveliness is highly correlated to engagement in educational videos. While previous work has focused on feature engineering to estimate liveliness and that too using only the acoustic information, in this paper we propose a technique called LIVELINET that combines audio and visual information to predict liveliness. First, a convolutional neural network is used to predict the visual setup, which in turn identifies the modalities (visual and/or audio) to be used for liveliness prediction. Second, we propose a novel method that uses multimodal deep recurrent neural networks to automatically estimate if an educational video is lively or not. On the StyleX dataset of 450 one-minute long educational video snippets, our approach shows an relative improvement of 7.6% and 1.9% compared to a multimodal baseline and a deep network baseline using only the audio information respectively. [For the full proceedings, see ED592609.]
- Published
- 2016