Start Over

Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks.

Authors :: Gu, Mao
Zhao, Zhou
Jin, Weike
Cai, Deng
Wu, Fei
Source :: IEEE Transactions on Circuits & Systems for Video Technology. Dec2020, Vol. 30 Issue 12, p4453-4466. 14p.
Publication Year :: 2020
Abstract: Video dialog is a new and challenging task, which requires an AI agent to maintain a meaningful dialog with humans in natural language about video contents. Specifically, given a video, a dialog history and a new question about the video, the agent has to combine video information with dialog history to infer the answer. However, the existing methods of image dialog and video question answering, which fail to process the complexity of video information and establish the logical dependency of history contexts, are inappropriate to be applied directly to video dialog. In this paper, we propose a novel approach for video dialog called multi-grained convolutional self-attention context network, which combines video information with dialog history. Instead of using RNN to encode the sequence information, we design a multi-grained convolutional self-attention mechanism to capture both element and segment level interactions that contain multi-grained sequence information. Moreover, a hierarchical dialog history encoder is designed to learn the context-aware question representation. Finally, we establish two decoders in multiple-choice and open-ended forms respectively, which utilize different strategies to get the multi-model context-aware video representation and to generate human-like answers. We evaluate our method on two large-scale datasets. Due to the flexibility and parallelism of the new attention mechanism, our method can achieve higher time efficiency, and the extensive experiments also show the effectiveness of our method. [ABSTRACT FROM AUTHOR]

Subjects :: *NATURAL languages
*VIDEO codecs
*VIDEO processing
*VIDEOS

Details

Language :: English
ISSN :: 10518215
Volume :: 30
Issue :: 12
Database :: Academic Search Index
Journal :: IEEE Transactions on Circuits & Systems for Video Technology
Publication Type :: Academic Journal
Accession number :: 147575451
Full Text :: https://doi.org/10.1109/TCSVT.2019.2957309

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources