Author: "Sanjeev Khudanpur" / Publisher: institute of electrical and electronics engineers (ieee) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Sanjeev Khudanpur"' showing total 11 results

Start Over Author "Sanjeev Khudanpur" Publisher institute of electrical and electronics engineers (ieee)

11 results on '"Sanjeev Khudanpur"'

1. Efficient Self-Supervised Learning Representations for Spoken Language Identification

Author: Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Eng Siong Chng, Suzy J. Styles, and Sanjeev Khudanpur
Subjects: Signal Processing, Electrical and Electronic Engineering
Published: 2022
Full Text: View/download PDF

2. LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation

Author: Mahsa Yarmohammadi, Daniel Povey, Hang Lv, Li Ke, Lei Xie, Yiming Wang, and Sanjeev Khudanpur
Subjects: Computer science, Applied Mathematics, Frame (networking), 020206 networking & telecommunications, 02 engineering and technology, Security token, Token passing, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Electrical and Electronic Engineering, Lazy evaluation, Hidden Markov model, Algorithm, Word (computer architecture), Decoding methods
Abstract: We propose a novel lazy-evaluation token-group decoding algorithm with on-the-fly composition of weighted finite-state transducers (WFSTs) for large vocabulary continuous speech recognition. In the standard on-the-fly composition decoder, a base WFST and one or more incremental WFSTs are composed during decoding, and then token passing algorithm is employed to generate the lattice on the composed search space, resulting in substantial computation overhead. To improve speed, the proposed algorithm adopts 1) a token-group method, which groups tokens with the same state in the base WFST on each frame and limits the capacity of the group and 2) a lazy-evaluation method, which does not expand a token group and its source token groups until it processes a word label during decoding. Experiments show that the proposed decoder works notably up to 3 times faster than the standard on-the-fly composition decoder.
Published: 2021
Full Text: View/download PDF

3. Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR

Author: Sanjeev Khudanpur, Hossein Sameti, Daniel Povey, and Hossein Hadian
Subjects: Context model, Acoustics and Ultrasonics, Artificial neural network, Computer science, Pipeline (computing), Speech recognition, 020206 networking & telecommunications, 02 engineering and technology, Mutual information, Reduction (complexity), 030507 speech-language pathology & audiology, 03 medical and health sciences, Computational Mathematics, 0202 electrical engineering, electronic engineering, information engineering, Computer Science (miscellaneous), Electrical and Electronic Engineering, 0305 other medical science, Hidden Markov model, Decoding methods, Word (computer architecture)
Abstract: In recent years, end-to-end approaches to automatic speech recognition have received considerable attention as they are much faster in terms of preparing resources. However, conventional multistage approaches, which rely on a pipeline of training hidden Markov models (HMM)-GMM models and tree-building steps still give the state-of-the-art results on most databases. In this study, we investigate flat-start one-stage training of neural networks using lattice-free maximum mutual information (LF-MMI) objective function with HMM for large vocabulary continuous speech recognition. We thoroughly look into different issues that arise in such a setup and propose a standalone system, which achieves word error rates (WER) comparable with that of the state-of-the-art multi-stage systems while being much faster to prepare. We propose to use full biphones to enable flat-start context-dependent (CD) modeling and show through experiments that our CD modeling approach can be almost as effective as regular tree-based CD modeling. We show that our flat-start LF-MMI setup together with this tree-free CD modeling technique achieves 10 to 25 % relative WER reduction compared to other end-to-end methods on well-known databases. The improvements are larger for smaller databases.
Published: 2018
Full Text: View/download PDF

4. Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs

Author: Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur, and Yiming Wang
Subjects: Artificial neural network, Microphone, Time delay neural network, Computer science, Applied Mathematics, Speech recognition, Word error rate, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, Frame rate, Convolution, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Electrical and Electronic Engineering, Latency (engineering)
Abstract: Bidirectional long short-term memory (BLSTM) acoustic models provide a significant word error rate reduction compared to their unidirectional counterpart, as they model both the past and future temporal contexts. However, it is nontrivial to deploy bidirectional acoustic models for online speech recognition due to an increase in latency. In this letter, we propose the use of temporal convolution, in the form of time-delay neural network (TDNN) layers, along with unidirectional LSTM layers to limit the latency to 200 ms. This architecture has been shown to outperform the state-of-the-art low frame rate (LFR) BLSTM models. We further improve these LFR BLSTM acoustic models by operating them at higher frame rates at lower layers and show that the proposed model performs similar to these mixed frame rate BLSTMs. We present results on the Switchboard 300 h LVCSR task and the AMI LVCSR task, in the three microphone conditions.
Published: 2018
Full Text: View/download PDF

5. A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery

Author: Narges Ahmidi, Lingling Tao, Shahin Sefati, Sanjeev Khudanpur, Colin Lea, Gregory D. Hager, Yixin Gao, Luca Zappella, René Vidal, and Benjamin Bejar Haro
Subjects: Conditional random field, Engineering, Databases, Factual, Biomedical Engineering, 02 engineering and technology, Machine learning, computer.software_genre, Article, Field (computer science), Pattern Recognition, Automated, 030218 nuclear medicine & medical imaging, Activity recognition, 03 medical and health sciences, Imaging, Three-Dimensional, 0302 clinical medicine, Robotic Surgical Procedures, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Humans, Segmentation, Hidden Markov model, Gestures, Markov chain, business.industry, United States, Benchmarking, 020201 artificial intelligence & image processing, Clinical Competence, Artificial intelligence, business, computer, Gesture
Abstract: Objective : State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. Methods : In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. Results : Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. Conclusion : Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. Significance : The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.
Published: 2017
Full Text: View/download PDF

6. Stepwise Optimal Subspace Pursuit for Improving Sparse Recovery

Author: Sanjeev Khudanpur, Trac D. Tran, and Balakrishnan Varadarajan
Subjects: Mathematical optimization, Signal reconstruction, Iterative method, Applied Mathematics, Approximation algorithm, Compressed sensing, Signal Processing, Algorithm design, Electrical and Electronic Engineering, Closed-form expression, Greedy algorithm, Algorithm, Mathematics, Sparse matrix
Abstract: We propose a new iterative algorithm to reconstruct an unknown sparse signal x from a set of projected measurements y = Φx . Unlike existing methods, which rely crucially on the near orthogonality of the sampling matrix Φ , our approach makes stepwise optimal updates even when the columns of Φ are not orthogonal. We invoke a block-wise matrix inversion formula to obtain a closed-form expression for the increase (reduction) in the L2-norm of the residue obtained by removing (adding) a single element from (to) the presumed support of x . We then use this expression to design a computationally tractable algorithm to search for the nonzero components of x . We show that compared to currently popular sparsity seeking matching pursuit algorithms, each step of the proposed algorithm is locally optimal with respect to the actual objective function. We demonstrate experimentally that the algorithm significantly outperforms conventional techniques in recovering sparse signals whose nonzero values have exponentially decaying magnitudes or are distributed N(0,1) .
Published: 2011
Full Text: View/download PDF

7. Likelihood-Based Semi-Supervised Model Selection With Applications to Speech Processing

Author: Patrick J. Wolfe, Sanjeev Khudanpur, and Christopher White
Subjects: FOS: Computer and information sciences, Computer science, Robust statistics, Word error rate, Machine Learning (stat.ML), Semi-supervised learning, Machine learning, computer.software_genre, Statistics - Applications, Machine Learning (cs.LG), Data modeling, Statistics - Machine Learning, Applications (stat.AP), Electrical and Electronic Engineering, Hidden Markov model, Computer Science - Computation and Language, business.industry, Model selection, Nonparametric statistics, Pattern recognition, Speech processing, Computer Science - Learning, ComputingMethodologies_PATTERNRECOGNITION, Signal Processing, Artificial intelligence, business, Computation and Language (cs.CL), computer
Abstract: In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically costly and difficult to obtain. This article proposes an alternative semi-supervised framework for likelihood-based model selection that leverages unlabeled data by using trained classifiers representing each model to automatically generate putative labels. The errors that result from this automatic labeling are shown to be amenable to results from robust statistics, which in turn provide for minimax-optimal censored likelihood ratio tests that recover the nonparametric sign test as a limiting case. This approach is then validated experimentally using a state-of-the-art automatic speech recognition system to select between candidate word pronunciations using unlabeled speech data that only potentially contain instances of the words under test. Results provide supporting evidence for the utility of this approach, and suggest that it may also find use in other applications of machine learning., Comment: 11 pages, 2 figures; submitted for publication
Published: 2010
Full Text: View/download PDF

8. Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education]

Author: Douglas O'Shaughnessy, Nelson Morgan, James Glass, Sanjeev Khudanpur, J. Baker, Li Deng, and Chin-Hui Lee
Subjects: Meeting of the minds, Machine translation, Computer science, business.industry, Applied Mathematics, Speech recognition, Speech processing, computer.software_genre, Data resources, Signal Processing, Language technology, Acronym, Electrical and Electronic Engineering, Language translation, business, computer, Digital signal processing
Abstract: This article is the second part of an updated version of the "MINDS 2006-2007 Report of the Speech Understanding Working Group," one of five reports emanating from two workshops entitled "Meeting of the MINDS: Future Directions for Human Language Technology," sponsored by the U.S. Disruptive Technology Office (DTO). (MINDS is an acronym for "machine translation, information retrieval, natural-language processing, data resources, and speech understanding").
Published: 2009
Full Text: View/download PDF

9. Developments and directions in speech recognition and understanding, Part 1 [DSP Education]

Author: J. Baker, Nelson Morgan, James Glass, Sanjeev Khudanpur, Li Deng, Douglas O'Shaughnessy, and Chin-Hui Lee
Subjects: Knowledge representation and reasoning, business.industry, Computer science, Applied Mathematics, Speech recognition, Speech processing, Field (computer science), Paradigm shift, Signal Processing, Language technology, Electrical and Electronic Engineering, Set (psychology), business, Digital signal processing
Abstract: To advance research, it is important to identify promising future research directions, especially those that have not been adequately pursued or funded in the past. The working group producing this article was charged to elicit from the human language technology (HLT) community a set of well-considered directions or rich areas for future research that could lead to major paradigm shifts in the field of automatic speech recognition (ASR) and understanding. ASR has been an area of great interest and activity to the signal processing and HLT communities over the past several decades. As a first step, this group reviewed major developments in the field and the circumstances that led to their success and then focused on areas it deemed especially fertile for future research. Part 1 of this article will focus on historically significant developments in the ASR area, including several major research efforts that were guided by different funding agencies, and suggest general areas in which to focus research.
Published: 2009
Full Text: View/download PDF

10. Order estimation for a special class of hidden Markov sources and binary renewal processes

Author: Prakash Narayan and Sanjeev Khudanpur
Subjects: Mathematical optimization, Markov kernel, Markov chain, Estimator, Library and Information Sciences, Computer Science Applications, Markov renewal process, Applied mathematics, Markov property, Renewal theory, Hidden semi-Markov model, Residual time, Information Systems, Mathematics
Abstract: We consider the estimation of the order, i.e., the number of hidden states, of a special class of discrete-time finite-alphabet hidden Markov sources. This class can be characterized in terms of equivalent renewal processes. No a priori bound is assumed on the maximum. permissible order. An order estimator based on renewal types is constructed, and is shown to be strongly consistent by computing the precise asymptotics of the probability of estimation error. The probability of underestimation of the true order decays exponentially in the number of observations while the probability of overestimation goes to zero sufficiently fast. It is further shown that this estimator has the best possible error exponent in a large class of estimators. Our results are also valid for the general class of binary independent-renewal processes with finite mean renewal times.
Published: 2002
Full Text: View/download PDF

11. Continuous space discriminative language modeling

Author: Puyang Xu, Matt Post, Chris Callison-Burch, Kenji Sagae, Daniel M. Bikel, Maider Lehr, Izhak Shafran, Damianos Karakos, Brian Roark, Sanjeev Khudanpur, Murat Saraclar, Eva Hasler, Keith Hall, Nathan Glenn, Darcey Riley, Adam Lopez, Emily Prud'hommeaux, Yuan Cao, and Philipp Koehn
Subjects: Signal processing, Computer science, Speech recognition
Abstract: Discriminative language modeling is a structured classification problem. Log-linear models have been previously used to address this problem. In this paper, the standard dot-product feature representation used in log-linear models is replaced by a non-linear function parameterized by a neural network. Embeddings are learned for each word and features are extracted automatically through the use of convolutional layers. Experimental results show that as a stand-alone model the continuous space model yields significantly lower word error rate (1% absolute), while having a much more compact parameterization (60%-90% smaller). If the baseline scores are combined, our approach performs equally well.
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Sanjeev Khudanpur"'

1. Efficient Self-Supervised Learning Representations for Spoken Language Identification

2. LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation

3. Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR

4. Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs

5. A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery

6. Stepwise Optimal Subspace Pursuit for Improving Sparse Recovery

7. Likelihood-Based Semi-Supervised Model Selection With Applications to Speech Processing

8. Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education]

9. Developments and directions in speech recognition and understanding, Part 1 [DSP Education]

10. Order estimation for a special class of hidden Markov sources and binary renewal processes

11. Continuous space discriminative language modeling

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

11 results on '"Sanjeev Khudanpur"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources