Author: "Fiaz, Mustansar" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Fiaz, Mustansar"' showing total 17 results

Start Over Author "Fiaz, Mustansar" Publication Type Reports

17 results on '"Fiaz, Mustansar"'

1. CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections

Author: Imam, Mohamed Fazli, Marew, Rufael Fedaku, Hassan, Jameel, Fiaz, Mustansar, Aji, Alham Fikri, and Cholakkal, Hisham
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In the era of foundation models, CLIP has emerged as a powerful tool for aligning text and visual modalities into a common embedding space. However, the alignment objective used to train CLIP often results in subpar visual features for fine-grained tasks. In contrast, SSL-pretrained models like DINO excel at extracting rich visual features due to their specialized training paradigm. Yet, these SSL models require an additional supervised linear probing step, which relies on fully labeled data which is often expensive and difficult to obtain at scale. In this paper, we propose a label-free prompt-tuning method that leverages the rich visual features of self-supervised learning models (DINO) and the broad textual knowledge of large language models (LLMs) to largely enhance CLIP-based image classification performance using unlabeled images. Our approach unfolds in three key steps: (1) We generate robust textual feature embeddings that more accurately represent object classes by leveraging class-specific descriptions from LLMs, enabling more effective zero-shot classification compared to CLIP's default name-specific prompts. (2) These textual embeddings are then used to produce pseudo-labels to train an alignment module that integrates the complementary strengths of LLM description-based textual embeddings and DINO's visual features. (3) Finally, we prompt-tune CLIP's vision encoder through DINO-assisted supervision using the trained alignment module. This three-step process allows us to harness the best of visual and textual foundation models, resulting in a powerful and efficient approach that surpasses state-of-the-art label-free classification methods. Notably, our framework, NoLA (No Labels Attached), achieves an average absolute gain of 3.6% over the state-of-the-art LaFter across 11 diverse image classification datasets.
Published: 2024

2. COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes

Author: Ali, Muhammad, Javaid, Mamoona, Noman, Mubashir, Fiaz, Mustansar, and Khan, Salman
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Automated waste recycling aims to efficiently separate the recyclable objects from the waste by employing vision-based systems. However, the presence of varying shaped objects having different material types makes it a challenging problem, especially in cluttered environments. Existing segmentation methods perform reasonably on many semantic segmentation datasets by employing multi-contextual representations, however, their performance is degraded when utilized for waste object segmentation in cluttered scenarios. In addition, plastic objects further increase the complexity of the problem due to their translucent nature. To address these limitations, we introduce an efficacious segmentation network, named COSNet, that uses boundary cues along with multi-contextual information to accurately segment the objects in cluttered scenes. COSNet introduces novel components including feature sharpening block (FSB) and boundary enhancement module (BEM) for enhancing the features and highlighting the boundary information of irregular waste objects in cluttered environment. Extensive experiments on three challenging datasets including ZeroWaste-f, SpectralWaste, and ADE20K demonstrate the effectiveness of the proposed method. Our COSNet achieves a significant gain of 1.8% on ZeroWaste-f and 2.1% on SpectralWaste datasets respectively in terms of mIoU metric., Comment: Accepted at WACV 2025
Published: 2024

3. Improving 3D Medical Image Segmentation at Boundary Regions using Local Self-attention and Global Volume Mixing

Author: Kareem, Daniya Najiha Abdul, Fiaz, Mustansar, Novershtern, Noa, Hanna, Jacob, and Cholakkal, Hisham
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Volumetric medical image segmentation is a fundamental problem in medical image analysis where the objective is to accurately classify a given 3D volumetric medical image with voxel-level precision. In this work, we propose a novel hierarchical encoder-decoder-based framework that strives to explicitly capture the local and global dependencies for volumetric 3D medical image segmentation. The proposed framework exploits local volume-based self-attention to encode the local dependencies at high resolution and introduces a novel volumetric MLP-mixer to capture the global dependencies at low-resolution feature representations, respectively. The proposed volumetric MLP-mixer learns better associations among volumetric feature representations. These explicit local and global feature representations contribute to better learning of the shape-boundary characteristics of the organs. Extensive experiments on three different datasets reveal that the proposed method achieves favorable performance compared to state-of-the-art approaches. On the challenging Synapse Multi-organ dataset, the proposed method achieves an absolute 3.82\% gain over the state-of-the-art approaches in terms of HD95 evaluation metrics {while a similar improvement pattern is exhibited in MSD Liver and Pancreas tumor datasets}. We also provide a detailed comparison between recent architectural design choices in the 2D computer vision literature by adapting them for the problem of 3D medical image segmentation. Finally, our experiments on the ZebraFish 3D cell membrane dataset having limited training data demonstrate the superior transfer learning capabilities of the proposed vMixer model on the challenging 3D cell instance segmentation task, where accurate boundary prediction plays a vital role in distinguishing individual cell instances.
Published: 2024

4. A Survey of the Self Supervised Learning Mechanisms for Vision Transformers

Author: Khan, Asifullah, Sohail, Anabia, Fiaz, Mustansar, Hassan, Mehdi, Afridi, Tariq Habib, Marwat, Sibghat Ullah, Munir, Farzeen, Ali, Safdar, Naseem, Hannan, Zaheer, Muhammad Zaigham, Ali, Kamran, Sultana, Tangina, Tanoli, Ziaurrehman, and Akhter, Naeem
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Deep supervised learning models require high volume of labeled data to attain sufficiently good results. Although, the practice of gathering and annotating such big data is costly and laborious. Recently, the application of self supervised learning (SSL) in vision tasks has gained significant attention. The intuition behind SSL is to exploit the synchronous relationships within the data as a form of self-supervision, which can be versatile. In the current big data era, most of the data is unlabeled, and the success of SSL thus relies in finding ways to utilize this vast amount of unlabeled data available. Thus it is better for deep learning algorithms to reduce reliance on human supervision and instead focus on self-supervision based on the inherent relationships within the data. With the advent of ViTs, which have achieved remarkable results in computer vision, it is crucial to explore and understand the various SSL mechanisms employed for training these models specifically in scenarios where there is limited labelled data available. In this survey, we develop a comprehensive taxonomy of systematically classifying the SSL techniques based upon their representations and pre-training tasks being applied. Additionally, we discuss the motivations behind SSL, review popular pre-training tasks, and highlight the challenges and advancements in this field. Furthermore, we present a comparative analysis of different SSL methods, evaluate their strengths and limitations, and identify potential avenues for future research., Comment: 34 Pages, 5 Figures, 7 Tables
Published: 2024

5. FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background

Author: Ali, Muhammad, Javaid, Mamoona, Noman, Mubashir, Fiaz, Mustansar, and Khan, Salman
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing deep learning approaches leave out the semantic cues that are crucial in semantic segmentation present in complex scenarios including cluttered backgrounds and translucent objects, etc. To handle these challenges, we propose a feature amplification network (FANet) as a backbone network that incorporates semantic information using a novel feature enhancement module at multi-stages. To achieve this, we propose an adaptive feature enhancement (AFE) block that benefits from both a spatial context module (SCM) and a feature refinement module (FRM) in a parallel fashion. SCM aims to exploit larger kernel leverages for the increased receptive field to handle scale variations in the scene. Whereas our novel FRM is responsible for generating semantic cues that can capture both low-frequency and high-frequency regions for better segmentation tasks. We perform experiments over challenging real-world ZeroWaste-f dataset which contains background-cluttered and translucent objects. Our experimental results demonstrate the state-of-the-art performance compared to existing methods., Comment: Accepted at ICIP 2024
Published: 2024

6. Medical Image Segmentation Using Directional Window Attention

Author: Kareem, Daniya Najiha Abdul, Fiaz, Mustansar, Novershtern, Noa, and Cholakkal, Hisham
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Accurate segmentation of medical images is crucial for diagnostic purposes, including cell segmentation, tumor identification, and organ localization. Traditional convolutional neural network (CNN)-based approaches struggled to achieve precise segmentation results due to their limited receptive fields, particularly in cases involving multi-organ segmentation with varying shapes and sizes. The transformer-based approaches address this limitation by leveraging the global receptive field, but they often face challenges in capturing local information required for pixel-precise segmentation. In this work, we introduce DwinFormer, a hierarchical encoder-decoder architecture for medical image segmentation comprising a directional window (Dwin) attention and global self-attention (GSA) for feature encoding. The focus of our design is the introduction of Dwin block within DwinFormer that effectively captures local and global information along the horizontal, vertical, and depthwise directions of the input feature map by separately performing attention in each of these directional volumes. To this end, our Dwin block introduces a nested Dwin attention (NDA) that progressively increases the receptive field in horizontal, vertical, and depthwise directions and a convolutional Dwin attention (CDA) that captures local contextual information for the attention computation. While the proposed Dwin block captures local and global dependencies at the first two high-resolution stages of DwinFormer, the GSA block encodes global dependencies at the last two lower-resolution stages. Experiments over the challenging 3D Synapse Multi-organ dataset and Cell HMS dataset demonstrate the benefits of our DwinFormer over the state-of-the-art approaches. Our source code will be publicly available at \url{https://github.com/Daniyanaj/DWINFORMER}., Comment: 5 pages
Published: 2024

7. ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection

Author: Noman, Mubashir, Fiaz, Mustansar, and Cholakkal, Hisham
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Change detection (CD) is a fundamental task in remote sensing (RS) which aims to detect the semantic changes between the same geographical regions at different time stamps. Existing convolutional neural networks (CNNs) based approaches often struggle to capture long-range dependencies. Whereas recent transformer-based methods are prone to the dominant global representation and may limit their capabilities to capture the subtle change regions due to the complexity of the objects in the scene. To address these limitations, we propose an effective Siamese-based framework to encode the semantic changes occurring in the bi-temporal RS images. The main focus of our design is to introduce a change encoder that leverages local and global feature representations to capture both subtle and large change feature information from multi-scale features to precisely estimate the change regions. Our experimental study on two challenging CD datasets reveals the merits of our approach and obtains state-of-the-art performance., Comment: accepted at IGARSS 2024
Published: 2024

8. ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection

Author: Noman, Mubashir, Fiaz, Mustansar, Cholakkal, Hisham, Khan, Salman, and Khan, Fahad Shahbaz
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning has shown remarkable success in remote sensing change detection (CD), aiming to identify semantic change regions between co-registered satellite image pairs acquired at distinct time stamps. However, existing convolutional neural network and transformer-based frameworks often struggle to accurately segment semantic change regions. Moreover, transformers-based methods with standard self-attention suffer from quadratic computational complexity with respect to the image resolution, making them less practical for CD tasks with limited training data. To address these issues, we propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions while reducing the model size. Our ELGC-Net comprises a Siamese encoder, fusion modules, and a decoder. The focus of our design is the introduction of an Efficient Local-Global Context Aggregator module within the encoder, capturing enhanced global context and local spatial information through a novel pooled-transpose (PT) attention and depthwise convolution, respectively. The PT attention employs pooling operations for robust feature extraction and minimizes computational cost with transposed attention. Extensive experiments on three challenging CD datasets demonstrate that ELGC-Net outperforms existing methods. Compared to the recent transformer-based CD approach (ChangeFormer), ELGC-Net achieves a 1.4% gain in intersection over union metric on the LEVIR-CD dataset, while significantly reducing trainable parameters. Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks. Finally, we also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings, while achieving comparable performance. Project url https://github.com/techmn/elgcnet., Comment: accepted at IEEE TGRS
Published: 2024

9. DDAM-PS: Diligent Domain Adaptive Mixer for Person Search

Author: Almansoori, Mohammed Khaleed, Fiaz, Mustansar, and Cholakkal, Hisham
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Person search (PS) is a challenging computer vision problem where the objective is to achieve joint optimization for pedestrian detection and re-identification (ReID). Although previous advancements have shown promising performance in the field under fully and weakly supervised learning fashion, there exists a major gap in investigating the domain adaptation ability of PS models. In this paper, we propose a diligent domain adaptive mixer (DDAM) for person search (DDAP-PS) framework that aims to bridge a gap to improve knowledge transfer from the labeled source domain to the unlabeled target domain. Specifically, we introduce a novel DDAM module that generates moderate mixed-domain representations by combining source and target domain representations. The proposed DDAM module encourages domain mixing to minimize the distance between the two extreme domains, thereby enhancing the ReID task. To achieve this, we introduce two bridge losses and a disparity loss. The objective of the two bridge losses is to guide the moderate mixed-domain representations to maintain an appropriate distance from both the source and target domain representations. The disparity loss aims to prevent the moderate mixed-domain representations from being biased towards either the source or target domains, thereby avoiding overfitting. Furthermore, we address the conflict between the two subtasks, localization and ReID, during domain adaptation. To handle this cross-task conflict, we forcefully decouple the norm-aware embedding, which aids in better learning of the moderate mixed-domain representation. We conduct experiments to validate the effectiveness of our proposed method. Our approach demonstrates favorable performance on the challenging PRW and CUHK-SYSU datasets. Our source code is publicly available at \url{https://github.com/mustansarfiaz/DDAM-PS}., Comment: Accepted in WACV-2024. Code is here at \url{https://github.com/mustansarfiaz/DDAM-PS
Published: 2023

10. SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation

Author: Fiaz, Mustansar, Heidari, Moein, Anwer, Rao Muhammad, and Cholakkal, Hisham
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Microscopic image segmentation is a challenging task, wherein the objective is to assign semantic labels to each pixel in a given microscopic image. While convolutional neural networks (CNNs) form the foundation of many existing frameworks, they often struggle to explicitly capture long-range dependencies. Although transformers were initially devised to address this issue using self-attention, it has been proven that both local and global features are crucial for addressing diverse challenges in microscopic images, including variations in shape, size, appearance, and target region density. In this paper, we introduce SA2-Net, an attention-guided method that leverages multi-scale feature learning to effectively handle diverse structures within microscopic images. Specifically, we propose scale-aware attention (SA2) module designed to capture inherent variations in scales and shapes of microscopic regions, such as cells, for accurate segmentation. This module incorporates local attention at each level of multi-stage features, as well as global attention across multiple resolutions. Furthermore, we address the issue of blurred region boundaries (e.g., cell boundaries) by introducing a novel upsampling strategy called the Adaptive Up-Attention (AuA) module. This module enhances the discriminative ability for improved localization of microscopic regions using an explicit attention mechanism. Extensive experiments on five challenging datasets demonstrate the benefits of our SA2-Net model. Our source code is publicly available at \url{https://github.com/mustansarfiaz/SA2-Net}., Comment: BMVC 2023 accepted as oral
Published: 2023

11. Remote Sensing Change Detection With Transformers Trained from Scratch

Author: Noman, Mubashir, Fiaz, Mustansar, Cholakkal, Hisham, Narayan, Sanath, Anwer, Rao Muhammad, Khan, Salman, and Khan, Fahad Shahbaz
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Current transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark. This current strategy is driven by the fact that transformers typically require a large amount of training data to learn inductive biases, which is insufficient in standard CD datasets due to their small size. We develop an end-to-end CD approach with transformers that is trained from scratch and yet achieves state-of-the-art performance on four public benchmarks. Instead of using conventional self-attention that struggles to capture inductive biases when trained from scratch, our architecture utilizes a shuffled sparse-attention operation that focuses on selected sparse informative regions to capture the inherent characteristics of the CD data. Moreover, we introduce a change-enhanced feature fusion (CEFF) module to fuse the features from input image pairs by performing a per-channel re-weighting. Our CEFF module aids in enhancing the relevant semantic changes while suppressing the noisy ones. Extensive experiments on four CD datasets reveal the merits of the proposed contributions, achieving gains as high as 14.27\% in intersection-over-union (IoU) score, compared to the best-published results in the literature. Code is available at \url{https://github.com/mustansarfiaz/ScratchFormer}., Comment: 5 figures and 4 tables
Published: 2023

12. PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person Search

Author: Fiaz, Mustansar, Cholakkal, Hisham, Narayan, Sanath, Anwer, Rao Muhammad, and Khan, Fahad Shahbaz
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Person search is a challenging problem with various real-world applications, that aims at joint person detection and re-identification of a query person from uncropped gallery images. Although, the previous study focuses on rich feature information learning, it is still hard to retrieve the query person due to the occurrence of appearance deformations and background distractors. In this paper, we propose a novel attention-aware relation mixer (ARM) module for person search, which exploits the global relation between different local regions within RoI of a person and make it robust against various appearance deformations and occlusion. The proposed ARM is composed of a relation mixer block and a spatio-channel attention layer. The relation mixer block introduces a spatially attended spatial mixing and a channel-wise attended channel mixing for effectively capturing discriminative relation features within an RoI. These discriminative relation features are further enriched by introducing a spatio-channel attention where the foreground and background discriminability is empowered in a joint spatio-channel space. Our ARM module is generic and it does not rely on fine-grained supervision or topological assumptions, hence being easily integrated into any Faster R-CNN based person search methods. Comprehensive experiments are performed on two challenging benchmark datasets: CUHKSYSU and PRW. Our PS-ARM achieves state-of-the-art performance on both datasets. On the challenging PRW dataset, our PS-ARM achieves an absolute gain of 5 in the mAP score over SeqNet, while operating at a comparable speed., Comment: Paper accepted in ACCV 2022
Published: 2022

13. Brain MRI Segmentation using Rule-Based Hybrid Approach

Author: Fiaz, Mustansar, Ali, Kamran, Rehman, Abdul, Gul, M. Junaid, and Jung, Soon Ki
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical image segmentation being a substantial component of image processing plays a significant role to analyze gross anatomy, to locate an infirmity and to plan the surgical procedures. Segmentation of brain Magnetic Resonance Imaging (MRI) is of considerable importance for the accurate diagnosis. However, precise and accurate segmentation of brain MRI is a challenging task. Here, we present an efficient framework for segmentation of brain MR images. For this purpose, Gabor transform method is used to compute features of brain MRI. Then, these features are classified by using four different classifiers i.e., Incremental Supervised Neural Network (ISNN), K-Nearest Neighbor (KNN), Probabilistic Neural Network (PNN), and Support Vector Machine (SVM). Performance of these classifiers is investigated over different images of brain MRI and the variation in the performance of these classifiers is observed for different brain tissues. Thus, we proposed a rule-based hybrid approach to segment brain MRI. Experimental results show that the performance of these classifiers varies over each tissue MRI and the proposed rule-based hybrid approach exhibits better segmentation of brain MRI tissues., Comment: 8 figures
Published: 2019

14. Handcrafted and Deep Trackers: Recent Visual Object Tracking Approaches and Trends

Author: Fiaz, Mustansar, Mahmood, Arif, Javed, Sajid, and Jung, Soon Ki
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent years visual object tracking has become a very active research area. An increasing number of tracking algorithms are being proposed each year. It is because tracking has wide applications in various real world problems such as human-computer interaction, autonomous vehicles, robotics, surveillance and security just to name a few. In the current study, we review latest trends and advances in the tracking area and evaluate the robustness of different trackers based on the feature extraction methods. The first part of this work comprises a comprehensive survey of the recently proposed trackers. We broadly categorize trackers into Correlation Filter based Trackers (CFTs) and Non-CFTs. Each category is further classified into various types based on the architecture and the tracking mechanism. In the second part, we experimentally evaluated 24 recent trackers for robustness, and compared handcrafted and deep feature based trackers. We observe that trackers using deep features performed better, though in some cases a fusion of both increased performance significantly. In order to overcome the drawbacks of the existing benchmarks, a new benchmark Object Tracking and Temple Color (OTTC) has also been proposed and used in the evaluation of different algorithms. We analyze the performance of trackers over eleven different challenges in OTTC, and three other benchmarks. Our study concludes that Discriminative Correlation Filter (DCF) based trackers perform better than the others. Our study also reveals that inclusion of different types of regularizations over DCF often results in boosted tracking performance. Finally, we sum up our study by pointing out some insights and indicating future trends in visual object tracking field., Comment: 27pages, 26 figures. arXiv admin note: substantial text overlap with arXiv:1802.03098
Published: 2018

15. Tracking Noisy Targets: A Review of Recent Object Tracking Approaches

Author: Fiaz, Mustansar, Mahmood, Arif, and Jung, Soon Ki
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual object tracking is an important computer vision problem with numerous real-world applications including human-computer interaction, autonomous vehicles, robotics, motion-based recognition, video indexing, surveillance and security. In this paper, we aim to extensively review the latest trends and advances in the tracking algorithms and evaluate the robustness of trackers in the presence of noise. The first part of this work comprises a comprehensive survey of recently proposed tracking algorithms. We broadly categorize trackers into correlation filter based trackers and the others as non-correlation filter trackers. Each category is further classified into various types of trackers based on the architecture of the tracking mechanism. In the second part of this work, we experimentally evaluate tracking algorithms for robustness in the presence of additive white Gaussian noise. Multiple levels of additive noise are added to the Object Tracking Benchmark (OTB) 2015, and the precision and success rates of the tracking algorithms are evaluated. Some algorithms suffered more performance degradation than others, which brings to light a previously unexplored aspect of the tracking algorithms. The relative rank of the algorithms based on their performance on benchmark datasets may change in the presence of noise. Our study concludes that no single tracker is able to achieve the same efficiency in the presence of noise as under noise-free conditions; thus, there is a need to include a parameter for robustness to noise when evaluating newly proposed tracking algorithms., Comment: 26 pages, 10 figures, 3 tables
Published: 2018

16. Comparative Study of ECO and CFNet Trackers in Noisy Environment

Author: Fiaz, Mustansar, Javed, Sajid, Mahmood, Arif, and Jung, Soon Ki
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Object tracking is one of the most challenging task and has secured significant attention of computer vision researchers in the past two decades. Recent deep learning based trackers have shown good performance on various tracking challenges. A tracking method should track objects in sequential frames accurately in challenges such as deformation, low resolution, occlusion, scale and light variations. Most trackers achieve good performance on specific challenges instead of all tracking problems, hence there is a lack of general purpose tracking algorithms that can perform well in all conditions. Moreover, performance of tracking techniques has not been evaluated in noisy environments. Visual object tracking has real world applications and there is good chance that noise may get added during image acquisition in surveillance cameras. We aim to study the robustness of two state of the art trackers in the presence of noise including Efficient Convolutional Operators (ECO) and Correlation Filter Network (CFNet). Our study demonstrates that the performance of these trackers degrades as the noise level increases, which demonstrate the need to design more robust tracking algorithms., Comment: 4 pages, 5 figures
Published: 2018

17. Gender Identification using MFCC for Telephone Applications - A Comparative Study

Author: Ahmad, Jamil, Fiaz, Mustansar, Kwon, Soon-il, Sodanil, Maleerat, Vo, Bay, and Baik, Sung Wook
Subjects: Computer Science - Sound
Abstract: Gender recognition is an essential component of automatic speech recognition and interactive voice response systems. Determining gender of the speaker reduces the computational burden of such systems for any further processing. Typical methods for gender recognition from speech largely depend on features extraction and classification processes. The purpose of this study is to evaluate the performance of various state-of-the-art classification methods along with tuning their parameters for helping selection of the optimal classification methods for gender recognition tasks. Five classification schemes including k-nearest neighbor, na\"ive Bayes, multilayer perceptron, random forest, and support vector machine are comprehensively evaluated for determination of gender from telephonic speech using the Mel-frequency cepstral coefficients. Different experiments were performed to determine the effects of training data sizes, length of the speech streams, and parameter tuning on classification performance. Results suggest that SVM is the best classifier among all the five schemes for gender recognition.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Fiaz, Mustansar"'

1. CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections

2. COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes

3. Improving 3D Medical Image Segmentation at Boundary Regions using Local Self-attention and Global Volume Mixing

4. A Survey of the Self Supervised Learning Mechanisms for Vision Transformers

5. FANet: Feature Amplification Network for Semantic Segmentation in Cluttered Background

6. Medical Image Segmentation Using Directional Window Attention

7. ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection

8. ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection

9. DDAM-PS: Diligent Domain Adaptive Mixer for Person Search

10. SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation

11. Remote Sensing Change Detection With Transformers Trained from Scratch

12. PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person Search

13. Brain MRI Segmentation using Rule-Based Hybrid Approach

14. Handcrafted and Deep Trackers: Recent Visual Object Tracking Approaches and Trends

15. Tracking Noisy Targets: A Review of Recent Object Tracking Approaches

16. Comparative Study of ECO and CFNet Trackers in Noisy Environment

17. Gender Identification using MFCC for Telephone Applications - A Comparative Study

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

17 results on '"Fiaz, Mustansar"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources