Author: "Paliwal, Kuldip K." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Paliwal, Kuldip K."' showing total 344 results

Start Over Author "Paliwal, Kuldip K."

344 results on '"Paliwal, Kuldip K."'

1. Deep Learning-Based Single-Ended Objective Quality Measures for Time-Scale Modified Audio

Author: Roberts, Timothy, Nicolson, Aaron, and Paliwal, Kuldip K.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound, Electrical Engineering and Systems Science - Signal Processing
Abstract: Objective evaluation of audio processed with Time-Scale Modification (TSM) is seeing a resurgence of interest. Recently, a labelled time-scaled audio dataset was used to train an objective measure for TSM evaluation. This DE measure was an extension of Perceptual Evaluation of Audio Quality, and required reference and test signals. In this paper, two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Data driven features are created by either a convolutional neural network (CNN) or a bidirectional gated recurrent unit (BGRU) network and fed to a fully-connected network to predict subjective mean opinion scores. The proposed CNN and BGRU measures achieve an average Root Mean Squared Error of 0.608 and 0.576, and a mean Pearson correlation of 0.771 and 0.794, respectively. The proposed measures are used to evaluate TSM algorithms, and comparisons are provided for 16 TSM implementations. The objective measure is available at https://www.github.com/zygurt/TSM., Comment: 13 pages, 11 figures, Submitted to The Journal of the Acoustical Society of America
Published: 2020

2. An Objective Measure of Quality for Time-Scale Modification of Audio

Author: Roberts, Timothy and Paliwal, Kuldip K.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Objective evaluation of audio processed with Time-Scale Modification (TSM) remains an open problem. Recently, a dataset of time-scaled audio with subjective quality labels was published and used to create an initial objective measure of quality. In this paper, an improved objective measure of quality for time-scaled audio is proposed. The measure uses hand-crafted features and a fully connected network to predict subjective mean opinion scores. Basic and Advanced Perceptual Evaluation of Audio Quality features are used in addition to nine features specific to TSM artefacts. Six methods of alignment are explored, with interpolation of the reference magnitude spectrum to the length of the test magnitude spectrum giving the best performance. The proposed measure achieves a mean Root Mean Squared Error of 0.487 and a mean Pearson correlation of 0.865, equivalent to 98th and 82nd percentiles of subjective sessions respectively. The proposed measure is used to evaluate time-scale modification algorithms, finding that Elastique gives the highest objective quality for Solo instrument and voice signals, while the Identity Phase-Locking Phase Vocoder gives the highest objective quality for music signals and the best overall quality. The objective measure is available at https://www.github.com/zygurt/TSM., Comment: 12 pages, 7 figures, Submitted to The Journal of the Acoustical Society of America, Currently under review
Published: 2020
Full Text: View/download PDF

3. A time-scale modification dataset with subjective quality labels

Author: Roberts, Timothy and Paliwal, Kuldip K.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: Time Scale Modification (TSM) is a well-researched field; however, no effective objective measure of quality exists. This paper details the creation, subjective evaluation, and analysis of a dataset for use in the development of an objective measure of quality for TSM. Comprised of two parts, the training component contains 88 source files processed using six TSM methods at 10 time scales, while the testing component contains 20 source files processed using three additional methods at four time scales. The source material contains speech, solo harmonic and percussive instruments, sound effects, and a range of music genres. Ratings (42 529) were collected from 633 sessions using laboratory and remote collection methods. Analysis of results shows no correlation between age and quality of rating; expert and non-expert listeners to be equivalent; minor differences between participants with and without hearing issues; and minimal differences between testing modalities. A comparison of published objective measures and subjective scores shows the objective measures to be poor indicators of subjective quality. Initial results for a retrained objective measure of quality are presented with results approaching average root mean squared error loss and Pearson correlation values of subjective sessions. The labeled dataset is available at http://ieee-dataport.org/1987., Comment: 12 Pages, 13 Figures, Published in The Journal of the Acoustical Society of America (Vol.148, Issue 1), For associated dataset, see http://ieee-dataport.org/1987
Published: 2020
Full Text: View/download PDF

4. Deep Residual-Dense Lattice Network for Speech Enhancement

Author: Nikzad, Mohammad, Nicolson, Aaron, Gao, Yongsheng, Zhou, Jun, Paliwal, Kuldip K., and Shang, Fanhua
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound, Statistics - Machine Learning
Abstract: Convolutional neural networks (CNNs) with residual links (ResNets) and causal dilated convolutional units have been the network of choice for deep learning approaches to speech enhancement. While residual links improve gradient flow during training, feature diminution of shallow layer outputs can occur due to repetitive summations with deeper layer outputs. One strategy to improve feature re-usage is to fuse both ResNets and densely connected CNNs (DenseNets). DenseNets, however, over-allocate parameters for feature re-usage. Motivated by this, we propose the residual-dense lattice network (RDL-Net), which is a new CNN for speech enhancement that employs both residual and dense aggregations without over-allocating parameters for feature re-usage. This is managed through the topology of the RDL blocks, which limit the number of outputs used for dense aggregations. Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. RDL-Nets also use substantially fewer parameters and have a lower computational requirement. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement., Comment: 8 pages, Accepted by AAAI-2020
Published: 2020

5. Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional Network

Author: Zhang, Qiquan, Nicolson, Aaron, Wang, Mingjiang, Paliwal, Kuldip K., and Wang, Chenxu
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Signal Processing
Abstract: Deep learning has achieved substantial improvement on single-channel speech enhancement tasks. However, the performance of multi-layer perceptions (MLPs)-based methods is limited by the ability to capture the long-term effective history information. The recurrent neural networks (RNNs), e.g., long short-term memory (LSTM) model, are able to capture the long-term temporal dependencies, but come with the issues of the high latency and the complexity of training.To address these issues, the temporal convolutional network (TCN) was proposed to replace the RNNs in various sequence modeling tasks. In this paper we propose a novel TCN model that employs multi-branch structure, called multi-branch TCN (MB-TCN), for monaural speech enhancement.The MB-TCN exploits split-transform-aggregate design, which is expected to obtain strong representational power at a low computational complexity.Inspired by the TCN, the MB-TCN model incorporates one dimensional causal dilated CNN and residual learning to expand receptive fields for capturing long-term temporal contextual information.Our extensive experimental investigation suggests that the MB-TCNs outperform the residual long short-term memory networks (ResLSTMs), temporal convolutional networks (TCNs), and the CNN networks that employ dense aggregations in terms of speech intelligibility and quality, while providing superior parameter efficiency. Furthermore, our experimental results demonstrate that our proposed MB-TCN model is able to outperform multiple state-of-the-art deep learning-based speech enhancement methods in terms of five widely used objective metrics., Comment: There are some inappropriate decriptions. These descriptions exist on many pages
Published: 2019

6. Sum-Product Networks for Robust Automatic Speaker Identification

Author: Nicolson, Aaron and Paliwal, Kuldip K.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: We introduce sum-product networks (SPNs) for robust speech processing through a simple robust automatic speaker identification (ASI) task. SPNs are deep probabilistic graphical models capable of answering multiple probabilistic queries. We show that SPNs are able to remain robust by using the marginal probability density function (PDF) of the spectral features that reliably represent speech. Though current SPN toolkits and learning algorithms are in their infancy, we aim to show that SPNs have the potential to become a useful tool for robust speech processing in the future. SPN speaker models are evaluated here on real-world non-stationary and coloured noise sources at multiple signal-to-noise ratio (SNR) levels. In terms of ASI accuracy, we find that SPN speaker models are more robust than two recent convolutional neural network (CNN)-based ASI systems. Additionally, SPN speaker models consist of significantly fewer parameters than their CNN-based counterparts. The results indicate that SPN speaker models could be a robust, parameter-efficient alternative for ASI. Additionally, this work demonstrates that SPNs have potential in related tasks, such as robust automatic speech recognition (ASR) and automatic speaker verification (ASV). Availability: The SPN ASI system is available at https://github.com/anicolson/SPN-ASI., Comment: Proc. Interspeech 2020
Published: 2019

7. Deep Xi as a Front-End for Robust Automatic Speech Recognition

Author: Nicolson, Aaron and Paliwal, Kuldip K.
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound, Electrical Engineering and Systems Science - Signal Processing
Abstract: Current front-ends for robust automatic speech recognition(ASR) include masking- and mapping-based deep learning approaches to speech enhancement. A recently proposed deep learning approach toa prioriSNR estimation, called DeepXi, was able to produce enhanced speech at a higher quality and intelligibility than current masking- and mapping-based approaches. Motivated by this, we investigate Deep Xi as a front-end for robust ASR. Deep Xi is evaluated using real-world non-stationary and coloured noise sources at multiple SNR levels. Our experimental investigation shows that DeepXi as a front-end is able to produce a lower word error rate than recent masking- and mapping-based deep learning front-ends. The results presented in this work show that Deep Xi is a viable front-end, and is able to significantly increase the robustness of an ASR system. Availability: Deep Xi is available at:https://github.com/anicolson/DeepXi
Published: 2019

8. On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement

Author: Roy, Sujan Kumar, Nicolson, Aaron, and Paliwal, Kuldip K.
Published: 2022
Full Text: View/download PDF

9. Robustness and sensitivity metrics-based tuning of the augmented Kalman filter for single-channel speech enhancement

Author: Roy, Sujan Kumar and Paliwal, Kuldip K.
Published: 2022
Full Text: View/download PDF

10. Masked multi-head self-attention for causal speech enhancement

Author: Nicolson, Aaron and Paliwal, Kuldip K.
Published: 2020
Full Text: View/download PDF

11. A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions

Author: Roy, Sujan Kumar and Paliwal, Kuldip K.
Published: 2021
Full Text: View/download PDF

12. Deep learning for minimum mean-square error approaches to speech enhancement

Author: Nicolson, Aaron and Paliwal, Kuldip K.
Published: 2019
Full Text: View/download PDF

13. Robustness metric-based tuning of the augmented Kalman filter for the enhancement of speech corrupted with coloured noise

Author: George, Aidan E.W., So, Stephen, Ghosh, Ratna, and Paliwal, Kuldip K.
Published: 2018
Full Text: View/download PDF

14. Speech Enhancement Using the Short-Time Discrete Cosine Transform

Author: Paliwal, Kuldip K, So, Stephen, Busch, Andrew W, Shi, Sisi, Paliwal, Kuldip K, So, Stephen, Busch, Andrew W, and Shi, Sisi
Abstract: The importance of speech enhancement (SE) is increased by its numerous real-life applications in hearing aids, speech recognition, speech coding, and cochlear implants. Transform-based speech enhancement algorithm (SEA), in particular, is the most popular, as the noise energy can be more easily distinguished and removed from the speech signal in the transform domain. Most speech enhancement systems use discrete Fourier transform (DFT), which has readily available short-time spectral amplitude (STSA) estimators. There are limitations with regard to the conventional DFT-based STSA estimators. For example, using a noisy phase spectrum for speech re-synthesis introduces an upper bound on the maximum improvement in speech quality; also, no close-form solution is available in the DFT domain when super-Gaussian speech prior is used for derivation. The approach presented here alleviates such limitations by using discrete cosine transform (DCT) instead of DFT, thereby improving perceived speech quality. Therefore, this thesis focuses on demonstrating the superiority of DCT representation in the context of STSA-estimation-based speech enhancement. As a foundation of the research, we first evaluate the relevance of DCT spectra towards perceived speech quality using objective and subjective testing. Consequently, the use of noisy polarity spectrum (PoS) for speech re-synthesis is justified. In the second stage of the research, we develop the optimal minimum mean square error (MMSE) estimators of the DCT STSA due to their primary importance on perceived speech quality. A novel polarity estimator (PoE) is also developed and used with the STSA estimator. Accordingly, we examine the effect of using PoE on the performance of the proposed SE system. Compared to the state-of-the-art DFT based STSA estimators, the proposed DCT-based approach demonstrates superior performance in enhancing noisy speech as indicated by objective and subjective quality measures., Thesis (PhD Doctorate), Doctor of Philosophy (PhD), School of Eng & Built Env, Science, Environment, Engineering and Technology, Full Text
Published: 2023

15. Protein Structure Prediction using Deep Learning

Author: Paliwal, Kuldip K, So, Stephen, Busch, Andrew W, Singh, Jaspreet, Paliwal, Kuldip K, So, Stephen, Busch, Andrew W, and Singh, Jaspreet
Abstract: Proteins are the most abundant macromolecules and essential to all life forms playing critical roles in many biological processes ranging from catalyst in enzymic reactions, antibodies counteracting antigens, transmitters of cellular signals, to acting as the structural elements in cells and tissues, and many more. The facet of proteins providing them with this functionality is their three-dimensional (3D) structure. Therefore, knowing the 3D structure of the protein is essential to understand its functionality and role. Recently, Alphafold2 has achieved atomic-level accuracy in predicting the 3D structure of the protein. This scientific achievement is built on accumulating improvement in protein one-dimensional (1D) and two-dimensional (2D) structural properties prediction along with advances in deep learning techniques. However, this success only partially solves the problem of protein structure prediction as it requires a minimum of 30 homologous sequences for effective prediction, and a large number of proteins lack homologous sequences. Consequently, it is still essential to improve the accuracy of protein secondary structure and inter-residue contact prediction for all proteins, specifically those without homologous sequences. This thesis explores several deep learning approaches, input features and improved training strategies for the prediction of protein secondary structure, 1D structural properties and inter-residues contact map prediction. It demonstrates leveraging large high sequence identity training sets and protein language models as input features enhance the prediction for all test sets, including those proteins which lack homologous sequences. There are mainly four research studies performed in this thesis., Thesis (PhD Doctorate), Doctor of Philosophy (PhD), School of Eng & Built Env, Science, Environment, Engineering and Technology, Full Text
Published: 2023

16. Hy-Tracker: A Novel Framework for Enhancing Efficiency and Accuracy of Object Tracking in Hyperspectral Videos

Author: Islam, Mohammad Aminul, Xing, Wangzhi, Zhou, Jun, Gao, Yongsheng, Paliwal, Kuldip K., Islam, Mohammad Aminul, Xing, Wangzhi, Zhou, Jun, Gao, Yongsheng, and Paliwal, Kuldip K.
Abstract: Hyperspectral object tracking has recently emerged as a topic of great interest in the remote sensing community. The hyperspectral image, with its many bands, provides a rich source of material information of an object that can be effectively used for object tracking. While most hyperspectral trackers are based on detection-based techniques, no one has yet attempted to employ YOLO for detecting and tracking the object. This is due to the presence of multiple spectral bands, the scarcity of annotated hyperspectral videos, and YOLO's performance limitation in managing occlusions, and distinguishing object in cluttered backgrounds. Therefore, in this paper, we propose a novel framework called Hy-Tracker, which aims to bridge the gap between hyperspectral data and state-of-the-art object detection methods to leverage the strengths of YOLOv7 for object tracking in hyperspectral videos. Hy-Tracker not only introduces YOLOv7 but also innovatively incorporates a refined tracking module on top of YOLOv7. The tracker refines the initial detections produced by YOLOv7, leading to improved object-tracking performance. Furthermore, we incorporate Kalman-Filter into the tracker, which addresses the challenges posed by scale variation and occlusion. The experimental results on hyperspectral benchmark datasets demonstrate the effectiveness of Hy-Tracker in accurately tracking objects across frames.
Published: 2023

17. Distance-based contact maps prediction for RNA bases using deep neural networks and single sequence features

Author: Rashid, Mahmood A. and Paliwal, Kuldip K.
Abstract: RNA molecules play critical roles in various biological processes, which are predominantly governed by their secondary and tertiary structures. The secondary structure of RNA helps us understand the functional behaviours and regulatory mechanisms of the RNA molecules. Although the experimental methods can determine highly accurate structures, those methods are expensive, time consuming and labour intensive. As a result, the gap between the number of known sequences and the number of known structures are increasing rapidly. The recent advancements in artificial intelligence and increasing number of known structures encourage researchers build deep learning models to predict RNA structures aiming to reduce this gap. Towards finding an efficient deep learning architecture, we implemented VGG16, VGG19, AlexNet, ResNet and GoogLeNet architecture based convolutional neural networks and trained them on single sequence RNA features. Along with the superior performance over other architectures, we found that the GoogLeNet based model improves the F1 scores (validation F1 = 0.74 and test F1 = 0.66) in comparison to the state-of-the-art F1 scores (validation F1 = 0.71 and test F1 = 0.64) for both validation and test datasets.
Published: 2024
Full Text: View/download PDF

18. Hy-Tracker: A Novel Framework for Enhancing Efficiency and Accuracy of Object Tracking in Hyperspectral Videos

Author: Islam, Mohammad Aminul, Xing, Wangzhi, Zhou, Jun, Gao, Yongsheng, and Paliwal, Kuldip K.
Abstract: Hyperspectral images, with their many spectral bands, provide a rich source of material information about an object that can be effectively used for object tracking. However, many trackers in this domain rely on detection-based techniques, which often perform suboptimally in challenging scenarios such as managing occlusions and distinguishing objects in cluttered backgrounds. This underperformance is primarily due to the presence of multiple spectral bands and the inability to leverage this abundance of data for effective tracking. Additionally, the scarcity of annotated hyperspectral videos and the absence of comprehensive temporal information exacerbate these difficulties, further limiting the effectiveness of current tracking methods. To address these challenges, this article introduces the novel Hy-Tracker framework, designed to bridge the gap between hyperspectral data and state-of-the-art object detection methods. Our approach leverages the strengths of YOLOv7 for object tracking in hyperspectral videos, enhancing both accuracy and robustness in complex scenarios. The Hy-Tracker framework comprises two key components. We introduce a hierarchical attention for band selection (HAS-BS) that selectively processes and groups the most informative spectral bands, thereby significantly improving detection accuracy. Additionally, we have developed a refined tracker that refines the initial detections by incorporating a classifier and a temporal network using gated recurrent units (GRUs). The classifier distinguishes similar objects, while the temporal network models temporal dependencies across frames for robust performance despite occlusions and scale variations (SVs). Experimental results on hyperspectral benchmark datasets demonstrate the effectiveness of Hy-Tracker in accurately tracking objects across frames and overcoming the challenges inherent in detection-based hyperspectral object tracking (HOT).
Published: 2024
Full Text: View/download PDF

19. A deterministic approach to regularized linear discriminant analysis

Author: Sharma, Alok and Paliwal, Kuldip K.
Published: 2015
Full Text: View/download PDF

20. Kalman Filter with Sensitivity Tuning for Improved Noise Reduction in Speech

Author: So, Stephen, George, Aidan E. W., Ghosh, Ratna, and Paliwal, Kuldip K.
Published: 2017
Full Text: View/download PDF

21. Quantization of Speech Features: Source Coding

Author: So, Stephen, Paliwal, Kuldip K., Singh, Sameer, editor, Tan, Zheng-Hua, and Lindberg, Børge
Published: 2008
Full Text: View/download PDF

22. RNA Structure Prediction using Deep Neural Network Architectures and Improved Evolutionary Profiles

Author: Paliwal, Kuldip K, So, Stephen, Singh, Jaswinder, Paliwal, Kuldip K, So, Stephen, and Singh, Jaswinder
Abstract: Full Text, Thesis (PhD Doctorate), Doctor of Philosophy (PhD), School of Eng & Built Env, Science, Environment, Engineering and Technology, RNAs are important biological macro-molecules that play critical roles in many biological processes. The functionality of RNA depends on its three-dimensional (3D) structure, which further depends on its primary structure, i.e. the order of sequence of nucleotides in the RNA chain. Direct prediction of the 3D structure of an RNA from its sequence is a challenging task. Therefore, the 3D structure is further divided into two-dimensional (2D) properties such as secondary structure, contact maps and one-dimensional (1D) properties such as torsion angles and solvent accessibility. An accurate prediction of these 1D and 2D structural properties will increase the accuracy in predicting the 3D structure of the RNA. This thesis explores various deep learning algorithms and input features relevant to predicting the 1D and 2D structural properties of an RNA. Using these predicted 1D and 2D structural properties further as restraints, we have demonstrated an improvement in the prediction of the RNA 3D structure. There are four primary studies performed in this thesis for RNA structural properties prediction. The first study introduces two methods (SPOT-RNA and SPOT-RNA2) for RNA secondary structure prediction using an ensemble of Residual Con-volution and Bi-directional LSTM recurrent neural networks. This study shows that deep learning based methods can outperform existing dynamic programming based algorithms and achieve state-of-the-art performance using single-sequence and evolutionary information as input. The second study investigates the application of deep neural networks for predicting RNA backbone torsion and pseudotorsion angles. We have pioneered in predicting the backbone torsion and pseudotorsion angles using deep learning (SPOT-RNA-1D). The angles predicted using SPOT-RNA-1D could be used as 3D model quality indicators. The third study introduces a method (SPOT-RNA-2D) to predict RNA distance-based contact maps using an ensemble of deep neural networks and improv
Published: 2022

23. Linear discriminant analysis for the small sample size problem: an overview

Author: Sharma, Alok and Paliwal, Kuldip K.
Published: 2015
Full Text: View/download PDF

24. Modulation-domain Kalman filtering for single-channel speech enhancement

Author: So, Stephen and Paliwal, Kuldip K.
Published: 2011
Full Text: View/download PDF

25. Suppressing the influence of additive noise on the Kalman gain for low residual noise speech enhancement

Author: So, Stephen and Paliwal, Kuldip K.
Published: 2011
Full Text: View/download PDF

26. Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional Network

Author: Zhang, Qiquan, primary, Qian, Xinyuan, additional, Nicolson, Aaron, additional, Wang, Chenxu, additional, and Paliwal, Kuldip K., additional
Published: 2022
Full Text: View/download PDF

27. A feature selection method using improved regularized linear discriminant analysis

Author: Sharma, Alok, Paliwal, Kuldip K., Imoto, Seiya, and Miyano, Satoru
Published: 2014
Full Text: View/download PDF

28. Principal component analysis using QR decomposition

Author: Sharma, Alok, Paliwal, Kuldip K., Imoto, Seiya, and Miyano, Satoru
Published: 2013
Full Text: View/download PDF

29. Deep Learning-Based Single-Ended Quality Prediction for Time-Scale Modified Audio

Author: Roberts, Timothy, primary, Nicolson, Aaron, additional, and Paliwal, Kuldip K., additional
Published: 2021
Full Text: View/download PDF

30. Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Author: Roy, Sujan Kumar, primary and Paliwal, Kuldip K., additional
Published: 2021
Full Text: View/download PDF

31. On training targets for deep learning approaches to clean speech magnitude spectrum estimation

Author: Nicolson, Aaron, primary and Paliwal, Kuldip K., additional
Published: 2021
Full Text: View/download PDF

32. Kalman Filtering with Machine Learning Methods for Speech Enhancement

Author: Paliwal, Kuldip K, So, Stephen, Roy, Sujan K, Paliwal, Kuldip K, So, Stephen, and Roy, Sujan K
Abstract: Full Text, Thesis (PhD Doctorate), Doctor of Philosophy (PhD), School of Eng & Built Env, Science, Environment, Engineering and Technology, Speech corrupted by background noise (or noisy speech) can reduce the efficiency of communication between man-man and man-machine. A speech enhancement algorithm (SEA) can be used to suppress the embedded background noise and increase the quality and intelligibility of noisy speech. Many applications, such as speech communication systems, hearing aid devices, and speech recognition systems, typically rely upon speech enhancement algorithms for robustness. This dissertation focuses on single-channel speech enhancement using Kalman filtering with machine learning methods. In Kalman filter (KF)-based speech enhancement, each clean speech frame is represented by an auto-regressive (AR) process, whose parameters comprise the linear prediction coefficients (LPCs) and prediction error variance. The LPC parameters and the additive noise variance are used to form the recursive equations of the KF. In augmented KF (AKF), both the clean speech and additive noise LPC parameters are incorporated into an augmented matrix to construct the recursive equations of AKF. Given a frame of noisy speech samples, the KF and AKF give a linear MMSE estimate of the clean speech samples using the recursive equations. Usually, the inaccurate estimates of the parameters introduce bias in the KF and AKF gain, leading to a degradation in speech enhancement performance. The research contributions in this dissertation can be grouped into three focus areas. In the first work, we propose an iterative KF (IT-KF) to offset the bias in KF gain for speech enhancement through utilizing the parameters in real-life noise conditions. In the second work, we jointly incorporate the robustness and sensitivity metrics to offset the bias in the KF and AKF gain - which address speech enhancement in real-life noise conditions. The third focus area consists of the deep neural network (DNN) and whitening filter assisted KF and AKF for speech enhancement. Specifically, DNN and whitening filter-based approaches utilize
Published: 2021

33. Design of Objective Quality Measures for Time-Scale Modification of Audio

Author: Paliwal, Kuldip K, Busch, Andrew W, Roberts, Timothy, Paliwal, Kuldip K, Busch, Andrew W, and Roberts, Timothy
Abstract: Full Text, Thesis (PhD Doctorate), Doctor of Philosophy (PhD), School of Eng & Built Env, Science, Environment, Engineering and Technology, Time-Scale Modification (TSM) is a well-researched field and allows for time-domain manipulation of a signal without modifying the pitch or timbre. Many TSM methods have been presented, however quantitative results on the quality of these methods are rare, with most methods reporting informal listening tests. This is likely due to the timecommitment and cost of subjective testing. Additionally, an objective measure of quality has not yet been developed that is suitable for timescaled signals. This dissertation describes the design of e ective objective measures of quality for TSM. TSM methods are, generally, single channel algorithms that give poor results when applied to multi-channel signals, as the phase relationship between channels must be maintained. This dissertation proposes a method and additional variant for maintaining the phase relationship between channels and retaining the presence in the centre of the stereo signal. The method involves pre- and post-processing the signal, with the variant processing each frame for real-time suitability. Sum and di erence transformations of the stereo signal are used for TSM and result in a large improvement in stereo phase coherence, consequently maintaining the stereo field. The proposed method produces a highquality stereo output and greatly improves quality over the independent channel processing method. It also allows for simple implementation around all existing TSM frameworks. A modification to the Epoch-Synchronous Overlap-Add (ESOLA) TSM algorithm is proposed in this dissertation. The proposed method, Fuzzy Epoch-Synchronous Overlap-Add, improves on the previous ESOLA method through cross-correlation of time-smeared epochs before overlap-adding. This reduces distortion and artefacts while the speaker's fundamental frequency is stable, as well as reducing artefacts during pitch modulation. The proposed method is tested against well-known TSM algorithms. It is preferred over ESOLA and gives similar performance t
Published: 2021

34. Cancer classification by gradient LDA technique using microarray gene expression data

Author: Sharma, Alok and Paliwal, Kuldip K.
Published: 2008
Full Text: View/download PDF

35. On Training Targets for Supervised LPC Estimation to Augmented Kalman Filter-based Speech Enhancement

Author: Roy, Sujan Kumar, primary, Nicolson, Aaron, primary, and Paliwal, Kuldip K., primary
Published: 2021
Full Text: View/download PDF

36. DeepLPC-MHANet: Multi-Head Self-Attention for Augmented Kalman Filter-based Speech Enhancement

Author: Roy, Sujan Kumar, primary, Nicolson, Aaron, primary, and Paliwal, Kuldip K., primary
Published: 2021
Full Text: View/download PDF

37. DeepLPC: A Deep Learning Approach to Augmented Kalman Filter-Based Single-Channel Speech Enhancement

Author: Roy, Sujan Kumar, primary, Nicolson, Aaron, primary, and Paliwal, Kuldip K., primary
Published: 2021
Full Text: View/download PDF

38. An objective measure of quality for time-scale modification of audio

Author: Roberts, Timothy, primary and Paliwal, Kuldip K., additional
Published: 2021
Full Text: View/download PDF

39. A Gradient Linear Discriminant Analysis for Small Sample Sized Problem

Author: Sharma, Alok and Paliwal, Kuldip K.
Published: 2008
Full Text: View/download PDF

40. Rotational linear discriminant analysis technique for dimensionality reduction

Author: Sharma, Alok and Paliwal, Kuldip K.
Subjects: Discriminant analysis -- Reports, Factor analysis -- Reports, Error-correcting codes -- Analysis, Transformations (Mathematics) -- Analysis, Business, Computers, Electronics, Electronics and electrical industries
Abstract: The linear discriminant analysis (LDA) technique is very popular in pattern recognition for dimensionality reduction. It is a supervised learning technique that finds a linear transformation such that the overlap between the classes is minimum for the projected feature vectors in the reduced feature space. This overlap, if present, adversely affects the classification performance. In this paper, we introduce prior to dimensionality-reduction transformation an additional rotational transform that rotates the feature vectors in the original feature space around their respective class centroids in such a way that the overlap between the classes in the reduced feature space is further minimized. As a result, the classification performance significantly improves, which is demonstrated using several data corpuses. Index Terms--Rotational linear discriminant analysis, dimensionality reduction, classification error, fixed-point algorithm, probability of error.
Published: 2008

41. Detecting masquerades using a combination of Naïve Bayes and weighted RBF approach

Author: Sharma, Alok and Paliwal, Kuldip K.
Published: 2007
Full Text: View/download PDF

42. Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing

Author: Paliwal, Kuldip K, So, Stephen, Nicolson, Aaron M, Paliwal, Kuldip K, So, Stephen, and Nicolson, Aaron M
Abstract: Full Text, Thesis (PhD Doctorate), Doctor of Philosophy (PhD), School of Eng & Built Env, Science, Environment, Engineering and Technology, Speech corrupted by background noise (or noisy speech) can cause misinterpretation and fatigue during phone and conference calls, and for hearing aid users. Noisy speech can also severely impact the performance of speech processing systems such as automatic speech recognition (ASR), automatic speaker verification (ASV), and automatic speaker identification (ASI) systems. Currently, deep learning approaches are employed in an end-to-end fashion to improve robustness. The target speech (or clean speech) is used as the training target or large noisy speech datasets are used to facilitate multi-condition training. In this dissertation, we propose competitive alternatives to the preceding approaches by updating two classic robust speech processing techniques using deep learning. The two techniques include minimum mean-square error (MMSE) and missing data approaches. An MMSE estimator aims to improve the perceived quality and intelligibility of noisy speech. This is accomplished by suppressing any background noise without distorting the speech. Prior to the introduction of deep learning, MMSE estimators were the standard speech enhancement approach. MMSE estimators require the accurate estimation of the a priori signal-to-noise ratio (SNR) to attain a high level of speech enhancement performance. However, current methods produce a priori SNR estimates with a large tracking delay and a considerable amount of bias. Hence, we propose a deep learning approach to a priori SNR estimation that is significantly more accurate than previous estimators, called Deep Xi. Through objective and subjective testing across multiple conditions, such as real-world non-stationary and coloured noise sources at multiple SNR levels, we show that Deep Xi allows MMSE estimators to produce the highest quality enhanced speech amongst all clean speech magnitude spectrum estimators. Missing data approaches improve robustness by performing inference only on noisy speech features that reliably represent
Published: 2020

43. Short-time phase spectrum in speech processing: A review and some experimental results

Author: Alsteris, Leigh D. and Paliwal, Kuldip K.
Published: 2007
Full Text: View/download PDF

44. A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding

Author: So, Stephen and Paliwal, Kuldip K.
Published: 2007
Full Text: View/download PDF

45. Efficient product code vector quantisation using the switched split vector quantiser

Author: So, Stephen and Paliwal, Kuldip K.
Published: 2007
Full Text: View/download PDF

46. Iterative reconstruction of speech from short-time Fourier transform phase and magnitude spectra

Author: Alsteris, Leigh D. and Paliwal, Kuldip K.
Published: 2007
Full Text: View/download PDF

47. Rotational linear discriminant analysis using Bayes rule for dimensionality reduction

Author: Sharma, Alok and Paliwal, Kuldip K.
Subjects: Decision theory -- Usage, Discriminant analysis, Factor analysis, Computers
Abstract: Abstract: Linear discriminant analysis (LDA) finds an orientation that projects high dimensional feature vectors to reduced dimensional feature space in such a way that the overlapping between the classes in [...]
Published: 2006

48. Splitting technique initialization in local PCA

Author: Sharma, Alok, Paliwal, Kuldip K., and Onwubolu, Godfrey C.
Subjects: Principal components analysis -- Usage, Vector analysis -- Usage, Computers
Abstract: Abstract: The local Principal Component Analysis (PCA) reduces linearly redundant components that may present in higher dimensional space. It deploys an initial guess technique which can be utilized when the [...]
Published: 2006

49. Pattern classification: an improvement using combination of VQ and PCA based techniques

Author: Sharma, Alok, Paliwal, Kuldip K., and Onwubolu, Godfrey C.
Subjects: Object recognition (Computers) -- Analysis, Pattern recognition -- Analysis, Vector analysis -- Usage, Parameter estimation, Principal components analysis, Science and technology
Abstract: Abstract: This study firstly presents a survey on basic classifiers namely minimum distance classifier (MDC), vector quantization (VQ), principal component analysis (PCA), nearest neighbour (NN) and k-nearest neighbour (kNN). Then [...]
Published: 2005

50. Causal Convolutional Neural Network-Based Kalman Filter for Speech Enhancement

Author: Roy, Sujan Kumar, primary and Paliwal, Kuldip K., additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

344 results on '"Paliwal, Kuldip K."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources