Start Over

End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features

Authors :: Hori, Chiori
Alamri, Huda
Wang, Jue
Wichern, Gordon
Hori, Takaaki
Cherian, Anoop
Marks, Tim K.
Cartillier, Vincent
Lopes, Raphael Gontijo
Das, Abhishek
Essa, Irfan
Batra, Dhruv
Parikh, Devi
Publication Year :: 2018
Abstract: Dialog systems need to understand dynamic visual scenes in order to have conversations with users about the objects and events around them. Scene-aware dialog systems for real-world applications could be developed by integrating state-of-the-art technologies from multiple research areas, including: end-to-end dialog technologies, which generate system responses using models trained from dialog data; visual question answering (VQA) technologies, which answer questions about images using learned image features; and video description technologies, in which descriptions/captions are generated from videos using multimodal information. We introduce a new dataset of dialogs about videos of human behaviors. Each dialog is a typed conversation that consists of a sequence of 10 question-and-answer(QA) pairs between two Amazon Mechanical Turk (AMT) workers. In total, we collected dialogs on roughly 9,000 videos. Using this new dataset for Audio Visual Scene-aware dialog (AVSD), we trained an end-to-end conversation model that generates responses in a dialog about a video. Our experiments demonstrate that using multimodal features that were developed for multimodal attention-based video description enhances the quality of generated dialog about dynamic scenes (videos). Our dataset, model code and pretrained models will be publicly available for a new Video Scene-Aware Dialog challenge.<br />Comment: A prototype system for the Audio Visual Scene-aware Dialog (AVSD) at DSTC7

Subjects :: Computer Science - Computation and Language
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.1806.08409
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources