Back to Search Start Over

Uncovering the Temporal Context for Video Question Answering.

Authors :
Zhu, Linchao
Xu, Zhongwen
Yang, Yi
Hauptmann, Alexander
Source :
International Journal of Computer Vision. Sep2017, Vol. 124 Issue 3, p409-421. 13p.
Publication Year :
2017

Abstract

In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder-decoder approach using Recurrent Neural Networks to learn the temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using the question form of 'fill-in-the-blank', and collect our Video Context QA dataset consisting of 109,895 video clips with a total duration of more than 1000 h from existing TACoS, MPII-MD and MEDTest 14 datasets. In addition, 390,744 corresponding questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09205691
Volume :
124
Issue :
3
Database :
Academic Search Index
Journal :
International Journal of Computer Vision
Publication Type :
Academic Journal
Accession number :
124729101
Full Text :
https://doi.org/10.1007/s11263-017-1033-7