Back to Search
Start Over
Uncovering the Temporal Context for Video Question Answering.
- Source :
-
International Journal of Computer Vision . Sep2017, Vol. 124 Issue 3, p409-421. 13p. - Publication Year :
- 2017
-
Abstract
- In this work, we introduce Video Question Answering in the temporal domain to infer the past, describe the present and predict the future. We present an encoder-decoder approach using Recurrent Neural Networks to learn the temporal structures of videos and introduce a dual-channel ranking loss to answer multiple-choice questions. We explore approaches for finer understanding of video content using the question form of 'fill-in-the-blank', and collect our Video Context QA dataset consisting of 109,895 video clips with a total duration of more than 1000 h from existing TACoS, MPII-MD and MEDTest 14 datasets. In addition, 390,744 corresponding questions are generated from annotations. Extensive experiments demonstrate that our approach significantly outperforms the compared baselines. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 09205691
- Volume :
- 124
- Issue :
- 3
- Database :
- Academic Search Index
- Journal :
- International Journal of Computer Vision
- Publication Type :
- Academic Journal
- Accession number :
- 124729101
- Full Text :
- https://doi.org/10.1007/s11263-017-1033-7