Author: "Farazi, Moshiur R" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Farazi, Moshiur R"' showing total 7 results

Start Over Author "Farazi, Moshiur R"

7 results on '"Farazi, Moshiur R"'

1. Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models

Author: Farazi, Moshiur R., Khan, Salman H., and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computational Complexity
Abstract: Visual Question Answering (VQA) has emerged as a Visual Turing Test to validate the reasoning ability of AI agents. The pivot to existing VQA models is the joint embedding that is learned by combining the visual features from an image and the semantic features from a given question. Consequently, a large body of literature has focused on developing complex joint embedding strategies coupled with visual attention mechanisms to effectively capture the interplay between these two modalities. However, modelling the visual and semantic features in a high dimensional (joint embedding) space is computationally expensive, and more complex models often result in trivial improvements in the VQA accuracy. In this work, we systematically study the trade-off between the model complexity and the performance on the VQA task. VQA models have a diverse architecture comprising of pre-processing, feature extraction, multimodal fusion, attention and final classification stages. We specifically focus on the effect of "multi-modal fusion" in VQA models that is typically the most expensive step in a VQA pipeline. Our thorough experimental evaluation leads us to two proposals, one optimized for minimal complexity and the other one optimized for state-of-the-art VQA performance.
Published: 2020

2. Question-Agnostic Attention for Visual Question Answering

Author: Farazi, Moshiur R, Khan, Salman H, and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual Question Answering (VQA) models employ attention mechanisms to discover image locations that are most relevant for answering a specific question. For this purpose, several multimodal fusion strategies have been proposed, ranging from relatively simple operations (e.g., linear sum) to more complex ones (e.g., Block). The resulting multimodal representations define an intermediate feature space for capturing the interplay between visual and semantic features, that is helpful in selectively focusing on image content. In this paper, we propose a question-agnostic attention mechanism that is complementary to the existing question-dependent attention mechanisms. Our proposed model parses object instances to obtain an `object map' and applies this map on the visual features to generate Question-Agnostic Attention (QAA) features. In contrast to question-dependent attention approaches that are learned end-to-end, the proposed QAA does not involve question-specific training, and can be easily included in almost any existing VQA model as a generic light-weight pre-processing step, thereby adding minimal computation overhead for training. Further, when used in complement with the question-dependent attention, the QAA allows the model to focus on the regions containing objects that might have been overlooked by the learned attention representation. Through extensive evaluation on VQAv1, VQAv2 and TDIUC datasets, we show that incorporating complementary QAA allows state-of-the-art VQA models to perform better, and provides significant boost to simplistic VQA models, enabling them to performance on par with highly sophisticated fusion strategies., Comment: To appear in the proceedings of International Conference on Pattern Recognition (ICPR) 2020
Published: 2019
Full Text: View/download PDF

3. From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts

Author: Farazi, Moshiur R, Khan, Salman H, and Barnes, Nick
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Current Visual Question Answering (VQA) systems can answer intelligent questions about `Known' visual content. However, their performance drops significantly when questions about visually and linguistically `Unknown' concepts are presented during inference (`Open-world' scenario). A practical VQA system should be able to deal with novel concepts in real world settings. To address this problem, we propose an exemplar-based approach that transfers learning (i.e., knowledge) from previously `Known' concepts to answer questions about the `Unknown'. We learn a highly discriminative joint embedding space, where visual and semantic features are fused to give a unified representation. Once novel concepts are presented to the model, it looks for the closest match from an exemplar set in the joint embedding space. This auxiliary information is used alongside the given Image-Question pair to refine visual attention in a hierarchical fashion. Since handling the high dimensional exemplars on large datasets can be a significant challenge, we introduce an efficient matching scheme that uses a compact feature description for search and retrieval. To evaluate our model, we propose a new split for VQA, separating Unknown visual and semantic concepts from the training set. Our approach shows significant improvements over state-of-the-art VQA models on the proposed Open-World VQA dataset and standard VQA datasets.
Published: 2018

4. Reciprocal Attention Fusion for Visual Question Answering

Author: Farazi, Moshiur R and Khan, Salman H
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Existing attention mechanisms either attend to local image grid or object level features for Visual Question Answering (VQA). Motivated by the observation that questions can relate to both object instances and their parts, we propose a novel attention mechanism that jointly considers reciprocal relationships between the two levels of visual details. The bottom-up attention thus generated is further coalesced with the top-down information to only focus on the scene elements that are most relevant to a given question. Our design hierarchically fuses multi-modal information i.e., language, object- and gird-level features, through an efficient tensor decomposition scheme. The proposed model improves the state-of-the-art single model performances from 67.9% to 68.2% on VQAv1 and from 65.7% to 67.4% on VQAv2, demonstrating a significant boost., Comment: To appear in the British Machine Vision Conference (BMVC), September 2018
Published: 2018

5. HairNet: a deep learning model to score leaf hairiness, a key phenotype for cotton fibre yield, value and insect resistance

Author: Rolland, Vivien, Farazi, Moshiur R., Conaty, Warren C., Cameron, Deon, Liu, Shiming, Petersson, Lars, and Stiller, Warwick N.
Published: 2022
Full Text: View/download PDF

6. From known to the unknown: Transferring knowledge to answer questions about novel visual and semantic concepts

Author: Farazi, Moshiur R., Khan, Salman H., and Barnes, Nick
Published: 2020
Full Text: View/download PDF

7. Inpainting multiple sclerosis lesions for improving registration performance with brain atlas

Author: Farazi, Moshiur R, primary, Faisal, Fahim, additional, Zaman, Zaied, additional, and Farhan, Soumik, additional
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Farazi, Moshiur R"'

1. Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models

2. Question-Agnostic Attention for Visual Question Answering

3. From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts

4. Reciprocal Attention Fusion for Visual Question Answering

5. HairNet: a deep learning model to score leaf hairiness, a key phenotype for cotton fibre yield, value and insect resistance

6. From known to the unknown: Transferring knowledge to answer questions about novel visual and semantic concepts

7. Inpainting multiple sclerosis lesions for improving registration performance with brain atlas

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

7 results on '"Farazi, Moshiur R"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources