11 results on '"Sanjeev Khudanpur"'
Search Results
2. LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation
- Author
-
Mahsa Yarmohammadi, Daniel Povey, Hang Lv, Li Ke, Lei Xie, Yiming Wang, and Sanjeev Khudanpur
- Subjects
Computer science ,Applied Mathematics ,Frame (networking) ,020206 networking & telecommunications ,02 engineering and technology ,Security token ,Token passing ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,Electrical and Electronic Engineering ,Lazy evaluation ,Hidden Markov model ,Algorithm ,Word (computer architecture) ,Decoding methods - Abstract
We propose a novel lazy-evaluation token-group decoding algorithm with on-the-fly composition of weighted finite-state transducers (WFSTs) for large vocabulary continuous speech recognition. In the standard on-the-fly composition decoder, a base WFST and one or more incremental WFSTs are composed during decoding, and then token passing algorithm is employed to generate the lattice on the composed search space, resulting in substantial computation overhead. To improve speed, the proposed algorithm adopts 1) a token-group method, which groups tokens with the same state in the base WFST on each frame and limits the capacity of the group and 2) a lazy-evaluation method, which does not expand a token group and its source token groups until it processes a word label during decoding. Experiments show that the proposed decoder works notably up to 3 times faster than the standard on-the-fly composition decoder.
- Published
- 2021
- Full Text
- View/download PDF
3. Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR
- Author
-
Sanjeev Khudanpur, Hossein Sameti, Daniel Povey, and Hossein Hadian
- Subjects
Context model ,Acoustics and Ultrasonics ,Artificial neural network ,Computer science ,Pipeline (computing) ,Speech recognition ,020206 networking & telecommunications ,02 engineering and technology ,Mutual information ,Reduction (complexity) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Computational Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering ,0305 other medical science ,Hidden Markov model ,Decoding methods ,Word (computer architecture) - Abstract
In recent years, end-to-end approaches to automatic speech recognition have received considerable attention as they are much faster in terms of preparing resources. However, conventional multistage approaches, which rely on a pipeline of training hidden Markov models (HMM)-GMM models and tree-building steps still give the state-of-the-art results on most databases. In this study, we investigate flat-start one-stage training of neural networks using lattice-free maximum mutual information (LF-MMI) objective function with HMM for large vocabulary continuous speech recognition. We thoroughly look into different issues that arise in such a setup and propose a standalone system, which achieves word error rates (WER) comparable with that of the state-of-the-art multi-stage systems while being much faster to prepare. We propose to use full biphones to enable flat-start context-dependent (CD) modeling and show through experiments that our CD modeling approach can be almost as effective as regular tree-based CD modeling. We show that our flat-start LF-MMI setup together with this tree-free CD modeling technique achieves 10 to 25 % relative WER reduction compared to other end-to-end methods on well-known databases. The improvements are larger for smaller databases.
- Published
- 2018
- Full Text
- View/download PDF
4. Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs
- Author
-
Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur, and Yiming Wang
- Subjects
Artificial neural network ,Microphone ,Time delay neural network ,Computer science ,Applied Mathematics ,Speech recognition ,Word error rate ,020206 networking & telecommunications ,Context (language use) ,02 engineering and technology ,Frame rate ,Convolution ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,Latency (engineering) - Abstract
Bidirectional long short-term memory (BLSTM) acoustic models provide a significant word error rate reduction compared to their unidirectional counterpart, as they model both the past and future temporal contexts. However, it is nontrivial to deploy bidirectional acoustic models for online speech recognition due to an increase in latency. In this letter, we propose the use of temporal convolution, in the form of time-delay neural network (TDNN) layers, along with unidirectional LSTM layers to limit the latency to 200 ms. This architecture has been shown to outperform the state-of-the-art low frame rate (LFR) BLSTM models. We further improve these LFR BLSTM acoustic models by operating them at higher frame rates at lower layers and show that the proposed model performs similar to these mixed frame rate BLSTMs. We present results on the Switchboard 300 h LVCSR task and the AMI LVCSR task, in the three microphone conditions.
- Published
- 2018
- Full Text
- View/download PDF
5. A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery
- Author
-
Narges Ahmidi, Lingling Tao, Shahin Sefati, Sanjeev Khudanpur, Colin Lea, Gregory D. Hager, Yixin Gao, Luca Zappella, René Vidal, and Benjamin Bejar Haro
- Subjects
Conditional random field ,Engineering ,Databases, Factual ,Biomedical Engineering ,02 engineering and technology ,Machine learning ,computer.software_genre ,Article ,Field (computer science) ,Pattern Recognition, Automated ,030218 nuclear medicine & medical imaging ,Activity recognition ,03 medical and health sciences ,Imaging, Three-Dimensional ,0302 clinical medicine ,Robotic Surgical Procedures ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Humans ,Segmentation ,Hidden Markov model ,Gestures ,Markov chain ,business.industry ,United States ,Benchmarking ,020201 artificial intelligence & image processing ,Clinical Competence ,Artificial intelligence ,business ,computer ,Gesture - Abstract
Objective : State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. Methods : In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. Results : Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. Conclusion : Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. Significance : The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.
- Published
- 2017
- Full Text
- View/download PDF
6. Stepwise Optimal Subspace Pursuit for Improving Sparse Recovery
- Author
-
Sanjeev Khudanpur, Trac D. Tran, and Balakrishnan Varadarajan
- Subjects
Mathematical optimization ,Signal reconstruction ,Iterative method ,Applied Mathematics ,Approximation algorithm ,Compressed sensing ,Signal Processing ,Algorithm design ,Electrical and Electronic Engineering ,Closed-form expression ,Greedy algorithm ,Algorithm ,Mathematics ,Sparse matrix - Abstract
We propose a new iterative algorithm to reconstruct an unknown sparse signal x from a set of projected measurements y = Φx . Unlike existing methods, which rely crucially on the near orthogonality of the sampling matrix Φ , our approach makes stepwise optimal updates even when the columns of Φ are not orthogonal. We invoke a block-wise matrix inversion formula to obtain a closed-form expression for the increase (reduction) in the L2-norm of the residue obtained by removing (adding) a single element from (to) the presumed support of x . We then use this expression to design a computationally tractable algorithm to search for the nonzero components of x . We show that compared to currently popular sparsity seeking matching pursuit algorithms, each step of the proposed algorithm is locally optimal with respect to the actual objective function. We demonstrate experimentally that the algorithm significantly outperforms conventional techniques in recovering sparse signals whose nonzero values have exponentially decaying magnitudes or are distributed N(0,1) .
- Published
- 2011
- Full Text
- View/download PDF
7. Likelihood-Based Semi-Supervised Model Selection With Applications to Speech Processing
- Author
-
Patrick J. Wolfe, Sanjeev Khudanpur, and Christopher White
- Subjects
FOS: Computer and information sciences ,Computer science ,Robust statistics ,Word error rate ,Machine Learning (stat.ML) ,Semi-supervised learning ,Machine learning ,computer.software_genre ,Statistics - Applications ,Machine Learning (cs.LG) ,Data modeling ,Statistics - Machine Learning ,Applications (stat.AP) ,Electrical and Electronic Engineering ,Hidden Markov model ,Computer Science - Computation and Language ,business.industry ,Model selection ,Nonparametric statistics ,Pattern recognition ,Speech processing ,Computer Science - Learning ,ComputingMethodologies_PATTERNRECOGNITION ,Signal Processing ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,computer - Abstract
In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically costly and difficult to obtain. This article proposes an alternative semi-supervised framework for likelihood-based model selection that leverages unlabeled data by using trained classifiers representing each model to automatically generate putative labels. The errors that result from this automatic labeling are shown to be amenable to results from robust statistics, which in turn provide for minimax-optimal censored likelihood ratio tests that recover the nonparametric sign test as a limiting case. This approach is then validated experimentally using a state-of-the-art automatic speech recognition system to select between candidate word pronunciations using unlabeled speech data that only potentially contain instances of the words under test. Results provide supporting evidence for the utility of this approach, and suggest that it may also find use in other applications of machine learning., Comment: 11 pages, 2 figures; submitted for publication
- Published
- 2010
- Full Text
- View/download PDF
8. Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education]
- Author
-
Douglas O'Shaughnessy, Nelson Morgan, James Glass, Sanjeev Khudanpur, J. Baker, Li Deng, and Chin-Hui Lee
- Subjects
Meeting of the minds ,Machine translation ,Computer science ,business.industry ,Applied Mathematics ,Speech recognition ,Speech processing ,computer.software_genre ,Data resources ,Signal Processing ,Language technology ,Acronym ,Electrical and Electronic Engineering ,Language translation ,business ,computer ,Digital signal processing - Abstract
This article is the second part of an updated version of the "MINDS 2006-2007 Report of the Speech Understanding Working Group," one of five reports emanating from two workshops entitled "Meeting of the MINDS: Future Directions for Human Language Technology," sponsored by the U.S. Disruptive Technology Office (DTO). (MINDS is an acronym for "machine translation, information retrieval, natural-language processing, data resources, and speech understanding").
- Published
- 2009
- Full Text
- View/download PDF
9. Developments and directions in speech recognition and understanding, Part 1 [DSP Education]
- Author
-
J. Baker, Nelson Morgan, James Glass, Sanjeev Khudanpur, Li Deng, Douglas O'Shaughnessy, and Chin-Hui Lee
- Subjects
Knowledge representation and reasoning ,business.industry ,Computer science ,Applied Mathematics ,Speech recognition ,Speech processing ,Field (computer science) ,Paradigm shift ,Signal Processing ,Language technology ,Electrical and Electronic Engineering ,Set (psychology) ,business ,Digital signal processing - Abstract
To advance research, it is important to identify promising future research directions, especially those that have not been adequately pursued or funded in the past. The working group producing this article was charged to elicit from the human language technology (HLT) community a set of well-considered directions or rich areas for future research that could lead to major paradigm shifts in the field of automatic speech recognition (ASR) and understanding. ASR has been an area of great interest and activity to the signal processing and HLT communities over the past several decades. As a first step, this group reviewed major developments in the field and the circumstances that led to their success and then focused on areas it deemed especially fertile for future research. Part 1 of this article will focus on historically significant developments in the ASR area, including several major research efforts that were guided by different funding agencies, and suggest general areas in which to focus research.
- Published
- 2009
- Full Text
- View/download PDF
10. Order estimation for a special class of hidden Markov sources and binary renewal processes
- Author
-
Prakash Narayan and Sanjeev Khudanpur
- Subjects
Mathematical optimization ,Markov kernel ,Markov chain ,Estimator ,Library and Information Sciences ,Computer Science Applications ,Markov renewal process ,Applied mathematics ,Markov property ,Renewal theory ,Hidden semi-Markov model ,Residual time ,Information Systems ,Mathematics - Abstract
We consider the estimation of the order, i.e., the number of hidden states, of a special class of discrete-time finite-alphabet hidden Markov sources. This class can be characterized in terms of equivalent renewal processes. No a priori bound is assumed on the maximum. permissible order. An order estimator based on renewal types is constructed, and is shown to be strongly consistent by computing the precise asymptotics of the probability of estimation error. The probability of underestimation of the true order decays exponentially in the number of observations while the probability of overestimation goes to zero sufficiently fast. It is further shown that this estimator has the best possible error exponent in a large class of estimators. Our results are also valid for the general class of binary independent-renewal processes with finite mean renewal times.
- Published
- 2002
- Full Text
- View/download PDF
11. Continuous space discriminative language modeling
- Author
-
Puyang Xu, Matt Post, Chris Callison-Burch, Kenji Sagae, Daniel M. Bikel, Maider Lehr, Izhak Shafran, Damianos Karakos, Brian Roark, Sanjeev Khudanpur, Murat Saraclar, Eva Hasler, Keith Hall, Nathan Glenn, Darcey Riley, Adam Lopez, Emily Prud'hommeaux, Yuan Cao, and Philipp Koehn
- Subjects
Signal processing ,Computer science ,Speech recognition - Abstract
Discriminative language modeling is a structured classification problem. Log-linear models have been previously used to address this problem. In this paper, the standard dot-product feature representation used in log-linear models is replaced by a non-linear function parameterized by a neural network. Embeddings are learned for each word and features are extracted automatically through the use of convolutional layers. Experimental results show that as a stand-alone model the continuous space model yields significantly lower word error rate (1% absolute), while having a much more compact parameterization (60%-90% smaller). If the baseline scores are combined, our approach performs equally well.
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.