Multiple answers to a question: a new approach for visual question answering.

Authors :: Hashemi Hosseinabad, Sayedshayan
Safayani, Mehran
Mirzaei, Abdolreza
Source :: Visual Computer. 2021, Vol. 37 Issue 1, p119-131. 13p.
Publication Year :: 2021
Abstract: With the advent of deep learning, multi-modal data have been of great interest. One of the multi-modal tasks which can be included in the computer vision domain is visual question answering (VQA). In VQA, a question and an image are entered into the model and the model tries to answer the question according to the image. To the best of our knowledge, the current techniques look at the image and only give one answer to the question asked. However, in some situations, there are several answers to the asked question. In this paper, we address this problem and define a new domain in the task of VQA as well as a new computationally efficient approach to cope with multiple-answer VQA. In this approach, we use a sliding window in an efficient manner to examine the answer to the question in different parts of the image. Due to the fact that so far no proper dataset is available for multiple-answer VQA, we provide a new dataset for evaluating our proposed model. The experiments express that our model uses 94% less operation than other models, making it very suitable for real-time applications. [ABSTRACT FROM AUTHOR]