Start Over

Infer unseen from seen: Relation regularized zero-shot visual dialog.

Authors :: Zhang, Zefan
Li, Shun
Ji, Yi
Liu, Chunping
Source :: Journal of Visual Communication & Image Representation. Dec2023, Vol. 97, pN.PAG-N.PAG. 1p.
Publication Year :: 2023
Abstract: The Visual Dialog task requires retrieving the correct answer based on detected objects, a current question, and history dialogs. However, in real-world scenarios, most existing models face the hard-positive problem and are unable to reason about unseen features, which limits their generalization ability. To address this issue, we propose two Relation Regularized Modules (RRM) in this article. The first is the Visual Relation Regularized Module (VRRM), which seeks known visual features that have semantic relations with unknown visual features and leverages these known features to assist in understanding the unknown features. The second is the Text Relation Regularized Module (TRRM), which enhances the keywords in the answers to strengthen the understanding of unknown text features. To evaluate the effectiveness of these modules, we propose two zero-shot Visual Dialog splits for verification: Visual Zero-shot VisDial with unseen visual features and Text Zero-shot VisDial with unseen answers. Experimental results demonstrate that our proposed modules achieve state-of-the-art performance in zero-shot Visual Dialog with unseen visual features and unseen answers, while also producing comparable results on the benchmark VisDial v1.0 test dataset. • Define a new Zero-shot Visual Dialog problem. • Propose Visual Relation Regularized Module. • Propose Text Relation Regularized Module. [ABSTRACT FROM AUTHOR]