1. Debiased Visual Question Answering via the perspective of question types.
- Author
-
Huai, Tianyu, Yang, Shuwen, Zhang, Junhang, Zhao, Jiabao, and He, Liang
- Subjects
- *
COUNTERFACTUALS (Logic) , *ANTILOCK brake systems in automobiles , *ANNOTATIONS - Abstract
Visual Question Answering (VQA) aims to answer questions according to the given image. However, current VQA models tend to rely solely on textual information from the questions and ignore the visual information in the images to get answers, which is caused by bias that is generated during the training phase. Previous studies have shown that bias in VQA is mainly caused by the text modality, and our analysis suggests that question type is a crucial factor in bias formation. To address this bias, we proposed a self-supervised method including the Against Biased Samples(ABS) module that performs targeted debiasing by selecting samples that are prone to bias, and the Shuffle Question types(SQT) module that constructs negative samples by randomly replacing the question types of the samples selected by the ABS, to interrupting the shortcuts from question type to answer. Our approach mitigates the question-to-answer bias without using external annotations, overcoming the prior language problem. Additionally, we designed a new objective function for negative samples. Experimental results indicate that our method outperforms both self-supervised-based and supervised-based state-of-the-art approaches, achieving 70.36% accuracy on the VQA-CP v2 dataset. • We propose a framework use image–question pairs to construct counterfactual samples. • SQT and ABS can target data that is prone to bias in the training set to de-bias. • We introduce a novel loss function base on the constructed negative samples. • Our method can achieve state-of-the-art performance on benchmark VQA-CP v2 dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF