Back to Search
Start Over
Infer unseen from seen: Relation regularized zero-shot visual dialog.
- Source :
-
Journal of Visual Communication & Image Representation . Dec2023, Vol. 97, pN.PAG-N.PAG. 1p. - Publication Year :
- 2023
-
Abstract
- The Visual Dialog task requires retrieving the correct answer based on detected objects, a current question, and history dialogs. However, in real-world scenarios, most existing models face the hard-positive problem and are unable to reason about unseen features, which limits their generalization ability. To address this issue, we propose two Relation Regularized Modules (RRM) in this article. The first is the Visual Relation Regularized Module (VRRM), which seeks known visual features that have semantic relations with unknown visual features and leverages these known features to assist in understanding the unknown features. The second is the Text Relation Regularized Module (TRRM), which enhances the keywords in the answers to strengthen the understanding of unknown text features. To evaluate the effectiveness of these modules, we propose two zero-shot Visual Dialog splits for verification: Visual Zero-shot VisDial with unseen visual features and Text Zero-shot VisDial with unseen answers. Experimental results demonstrate that our proposed modules achieve state-of-the-art performance in zero-shot Visual Dialog with unseen visual features and unseen answers, while also producing comparable results on the benchmark VisDial v1.0 test dataset. • Define a new Zero-shot Visual Dialog problem. • Propose Visual Relation Regularized Module. • Propose Text Relation Regularized Module. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10473203
- Volume :
- 97
- Database :
- Academic Search Index
- Journal :
- Journal of Visual Communication & Image Representation
- Publication Type :
- Academic Journal
- Accession number :
- 173992073
- Full Text :
- https://doi.org/10.1016/j.jvcir.2023.103961