Author: "Pierre L. Dognin" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

1. Evaluating Deep Scattering Spectra with deep neural networks on large scale spontaneous speech task

Author: Vaibhava Goel, Pierre L. Dognin, and Petr Fousek
Subjects: Normalization (statistics), Artificial neural network, Computer science, business.industry, Time delay neural network, Speech recognition, Normalization (image processing), Acoustic model, Pattern recognition, Image processing, Computer Science::Sound, Feature (machine learning), Spectrogram, Artificial intelligence, Image warping, business
Abstract: Deep Scattering Network features introduced for image processing have recently proved useful in speech recognition as an alternative to log-mel features for Deep Neural Network (DNN) acoustic models. Scattering features use wavelet decomposition directly producing log-frequency spectrograms which are robust to local time warping and provide additional information within higher order coefficients. This paper extends previous works by showing how scattering features perform on a state-of-the-art spontaneous speech recognition utilizing DNN acoustic model. We revisit feature normalization and compression topics in an extensive study, putting emphasis on comparing models of the same size. We observe that scattering features outperform baseline log-mel in all conditions, with additional gains from multi-resolution processing.
Published: 2015
Full Text: View/download PDF

2. Annealed dropout trained maxout networks for improved LVCSR

Author: Pierre L. Dognin, Xiaodong Cui, Steven J. Rennie, and Vaibhava Goel
Subjects: Training set, Computer science, business.industry, Speech recognition, Word error rate, Acoustic model, Pattern recognition, Artificial intelligence, business, Dropout (neural networks)
Abstract: A significant barrier to progress in automatic speech recognition (ASR) capability is the empirical reality that techniques rarely “scale”—the yield of many apparently fruitful techniques rapidly diminishes to zero as the training criterion or decoder is strengthened, or the size of the training set is increased. Recently we showed that annealed dropout—a regularization procedure which gradually reduces the percentage of neurons that are randomly zeroed out during DNN training—leads to substantial word error rate reductions in the case of small to moderate training data amounts, and acoustic models trained based on the cross-entropy (CE) criterion [1]. In this paper we show that deep Maxout networks trained using annealed dropout can substantially improve the quality of commercial-grade LVCSR systems even when the acoustic model is trained with sequence-level training criterion, and on large amounts of data.
Published: 2015
Full Text: View/download PDF

3. Direct product based deep belief networks for automatic speech recognition

Author: Steven J. Rennie, Petr Fousek, Vaibhava Goel, and Pierre L. Dognin
Subjects: Factorial, Artificial neural network, business.industry, Speech recognition, Rank (computer programming), Machine learning, computer.software_genre, Matrix decomposition, Set (abstract data type), symbols.namesake, Deep belief network, Kronecker delta, symbols, Artificial intelligence, business, computer, Direct product, Mathematics
Abstract: In this paper, we present new methods for parameterizing the connections of neural networks using sums of direct products. We show that low rank parameterizations of weight matrices are a subset of this set, and explore the theoretical and practical benefits of representing weight matrices using sums of Kronecker products. ASR results on a 50 hr subset of the English Broadcast News corpus indicate that the approach is promising. In particular, we show that a factorial network with more than 150 times less parameters in its bottom layer than its standard unconstrained counterpart suffers minimal WER degradation, and that by using sums of Kronecker products, we can close the gap in WER performance while maintaining very significant parameter savings. In addition, direct product DBNs consistently outperform standard DBNs with the same number of parameters. These results have important implications for research on deep belief networks (DBNs). They imply that we should be able to train neural networks with thousands of neurons and minimal restrictions much more rapidly than is currently possible, and that by using sums of direct products, it will be possible to train neural networks with literally millions of neurons tractably-an exciting prospect.
Published: 2013
Full Text: View/download PDF

4. Factorial Hidden Restricted Boltzmann Machines for noise robust speech recognition

Author: Pierre L. Dognin, Petr Fousek, and Steven J. Rennie
Subjects: Restricted Boltzmann machine, Computer science, business.industry, Speech recognition, Boltzmann machine, Inference, Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing), Pattern recognition, Deep belief network, Approximate inference, symbols.namesake, Computer Science::Sound, Robustness (computer science), Source separation, symbols, Artificial intelligence, Hidden Markov model, business, Gaussian process
Abstract: We present the Factorial Hidden Restricted Boltzmann Machine (FHRBM) for robust speech recognition. Speech and noise are modeled as independent RBMs, and the interaction between them is explicitly modeled to capture how speech and noise combine to generate observed noisy speech features. In contrast with RBMs, where the bottom layer of random variables is observed, inference in the FHRBM is intractable, scaling exponentially with the number of hidden units. We introduce variational algorithms for efficient approximate inference that scale linearly with the number of hidden units. Compared to traditional factorial models of noisy speech, which are based on GMMs, the FHRBM has the advantage that the representations of both speech and noise are highly distributed, allowing the model to learn a parts-based representation of noisy speech data that can generalize better to previously unseen noise compositions. Preliminary results suggest that the approach is promising.
Published: 2012
Full Text: View/download PDF

5. Matched-condition robust Dynamic Noise Adaptation

Author: Pierre L. Dognin, Petr Fousek, and Steven J. Rennie
Subjects: Online model, business.industry, Computer science, Speech recognition, Pattern recognition, Regression analysis, Mutual information, FMLLR, Background noise, Robustness (computer science), Maximum likelihood linear regression, Artificial intelligence, business, Dynamic noise
Abstract: In this paper we describe how the model-based noise robustness algorithm for previously unseen noise conditions, Dynamic Noise Adaptation (DNA), can be made robust to matched data, without the need to do any system re-training. The approach is to do online model selection and averaging between two DNA models of noise: one that is tracking the evolving state of the background noise, and one clamped to the null mis-match hypothesis. The approach, which we call DNA with (matched) condition detection (DNA-CD), improves the performance of a commerical-grade speech recognizer that utilizes feature-space Maximum Mutual Information (fMMI), boosted MMI (bMMI), and feature-space Maximum Likelihood Linear Regression (fMLLR) compensation by 15% relative at signal-to-noise ratios (SNRs) below 10 dB, and over 8% relative overall.
Published: 2011
Full Text: View/download PDF

6. Robust speech recognition using dynamic noise adaptation

Author: Steven J. Rennie, Petr Fousek, and Pierre L. Dognin
Subjects: Noise, Signal-to-noise ratio, Computer science, Speech recognition, Word error rate, Adaptation (computer science), Hidden Markov model, Dynamic noise, FMLLR
Abstract: Dynamic noise adaptation (DNA) [1, 2] is a model-based technique for improving automatic speech recognition (ASR) performance in noise. DNA has shown promise on artificially mixed data such as the Aurora II and DNA+Aurora II tasks [1]—significantly outperforming well-known techniques like the ETSI AFE and fMLLR [2]—but has never been tried on real data. In this paper, we present new results generated by commercial-grade ASR systems trained on large amounts of data. We show that DNA improves upon the performance of the spectral subtraction (SS) and stochastic fMLLR algorithms of our embedded recognizers, particularly in unseen noise conditions, and describe how DNA has been evolved to become suitable for deployment in low-latency ASR systems. DNA improves our best embedded system, which utilizes SS, fMLLR, and fMPE [3] by over 22% relative at SNRs below 6 dB, reducing the word error rate in these adverse conditions from 4.24% to 3.29%.
Published: 2011
Full Text: View/download PDF

7. Refactoring acoustic models using variational density approximation

Author: Vaibhava Goel, John R. Hershey, Pierre L. Dognin, and Peder A. Olsen
Subjects: business.industry, Gaussian, Word error rate, Pattern recognition, Mixture model, Data modeling, symbols.namesake, symbols, Artificial intelligence, Hidden Markov model, Cluster analysis, business, Gaussian process, Algorithm, Reference model, Mathematics
Abstract: In model-based pattern recognition it is often useful to change the structure, or refactor, a model. For example, we may wish to find a Gaussian mixture model (GMM) with fewer components that best approximates a reference model. One application for this arises in speech recognition, where a variety of model size requirements exists for different platforms. Since the target size may not be known a priori, one strategy is to train a complex model and subsequently derive models of lower complexity. We present methods for reducing model size without training data, following two strategies: GMM-approximation and Gaussian clustering based on divergences. A variational expectation-maximization algorithm is derived that unifies these two approaches. The resulting algorithms reduce the model size by 50% with less than 4% increase in error rate relative to the same-sized model trained on data. In fact, for up to 35% reduction in size, the algorithms can improve accuracy relative to baseline.
Published: 2009
Full Text: View/download PDF

8. A fast, accurate approximation to log likelihood of Gaussian mixture models

Author: Pierre L. Dognin, John R. Hershey, Vaibhava Goel, and Peder A. Olsen
Subjects: symbols.namesake, Approximation theory, Exponential distribution, Approximation error, Gaussian, Prior probability, Statistics, symbols, Word error rate, Mixture model, Gaussian process, Algorithm, Mathematics
Abstract: It has been a common practice in speech recognition and elsewhere to approximate the log likelihood of a Gaussian mixture model (GMM) with the maximum component log likelihood. While often a computational necessity, the max approximation comes at a price of inferior modeling when the Gaussian components significantly overlap. This paper shows how the approximation error can be reduced by changing component priors. In our experiments the loss in word error rate due to max approximation, albeit small, is reduced by 50–100% at no cost in computational efficiency. Furthermore, we expect acoustic models will become larger with time and increase component overlap and word error rate loss. This makes reducing the approximation error more relevant. The techniques considered do not use the original data and can easily be applied as a post-processing step to any GMM.
Published: 2009
Full Text: View/download PDF

9. Parameter optimization for vocal tract length normalization

Author: Amro El-Jaroudi, Jayadev Billa, and Pierre L. Dognin
Subjects: Normalization (statistics), Signal processing, Computer science, business.industry, Maximum likelihood, Word error rate, Pattern recognition, Artificial intelligence, Image warping, business, Vocal tract
Abstract: This paper focuses on the optimization of model parameters for vocal tract length normalization (VTLN). For maximum likelihood (ML) based normalization techniques, the complexity of the VTL-models is a source of variation in system performance. An optimal complexity for the VTL-model that ensures best global word error rate is proposed. The choice of frequency warping factor also depends on the signal processing step of VTLN. A best set of parameters for the VTLN signal processing stage is proposed with extensive results for an optimal frequency range.
Published: 2002
Full Text: View/download PDF

10. The 2001 BYBLOS English large vocabulary conversational speech recognition system

Author: Pierre L. Dognin, Herbert Gish, Carl Quillen, Fred Richardson, Thomas Colthurst, Spyros Matsoukas, Alex Solomonoff, and Owen Kimball
Subjects: Vocabulary, business.industry, Computer science, Speech recognition, media_common.quotation_subject, Word error rate, Initialization, Linear discriminant analysis, computer.software_genre, Thresholding, Test set, Benchmark (computing), NIST, Artificial intelligence, business, computer, Natural language processing, media_common
Abstract: This paper describes the BYBLOS system that BBN used to participate in the 2001 NIST Hub-5 evaluation benchmark. We outline the procedure used for training and decoding, and present the algorithmic improvements made to the system, along with experimental results. These improvements include a Gaussian splitting initialization procedure, the use of Linear Discriminant Analysis, and processing of additional acoustic training data. We also discuss our system combination and confidence-based thresholding methods. Experiments on an internal validation test set show that all these system improvements provide a 8.1 % relative reduction in word error rate compared to our 2000 LVCSR system.
Published: 2002
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"Pierre L. Dognin"'

1. Evaluating Deep Scattering Spectra with deep neural networks on large scale spontaneous speech task

2. Annealed dropout trained maxout networks for improved LVCSR

3. Direct product based deep belief networks for automatic speech recognition

4. Factorial Hidden Restricted Boltzmann Machines for noise robust speech recognition

5. Matched-condition robust Dynamic Noise Adaptation

6. Robust speech recognition using dynamic noise adaptation

7. Refactoring acoustic models using variational density approximation

8. A fast, accurate approximation to log likelihood of Gaussian mixture models

9. Parameter optimization for vocal tract length normalization

10. The 2001 BYBLOS English large vocabulary conversational speech recognition system

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

10 results on '"Pierre L. Dognin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources