Back to Search
Start Over
Multimodal Dual Attention Memory for Video Story Question Answering
- Source :
- Computer Vision – ECCV 2018 ISBN: 9783030012663, ECCV (15)
- Publication Year :
- 2018
- Publisher :
- Springer International Publishing, 2018.
-
Abstract
- We propose a video story question-answering (QA) architecture, Multimodal Dual Attention Memory (MDAM). The key idea is to use a dual attention mechanism with late fusion. MDAM uses self-attention to learn the latent concepts in scene frames and captions. Given a question, MDAM uses the second attention over these latent concepts. Multimodal fusion is performed after the dual attention processes (late fusion). Using this processing pipeline, MDAM learns to infer a high-level vision-language joint representation from an abstraction of the full video content. We evaluate MDAM on PororoQA and MovieQA datasets which have large-scale QA annotations on cartoon videos and movies, respectively. For both datasets, MDAM achieves new state-of-the-art results with significant margins compared to the runner-up models. We confirm the best performance of the dual attention mechanism combined with late fusion by ablation studies. We also perform qualitative analysis by visualizing the inference mechanisms of MDAM.
- Subjects :
- business.industry
Computer science
Deep learning
Inference
020207 software engineering
02 engineering and technology
DUAL (cognitive architecture)
computer.software_genre
Pipeline (software)
Multimodal learning
0202 electrical engineering, electronic engineering, information engineering
Question answering
020201 artificial intelligence & image processing
Artificial intelligence
business
Representation (mathematics)
computer
Natural language processing
Abstraction (linguistics)
Subjects
Details
- ISBN :
- 978-3-030-01266-3
- ISBNs :
- 9783030012663
- Database :
- OpenAIRE
- Journal :
- Computer Vision – ECCV 2018 ISBN: 9783030012663, ECCV (15)
- Accession number :
- edsair.doi...........6d1147abb8f341c2425dc640bce0b295