18 results on '"Brian Roark"'
Search Results
2. Approximating Probabilistic Models as Weighted Finite Automata
- Author
-
Ananda Theertha Suresh, Brian Roark, Michael Riley, and Vlad Schogol
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Abstract
AbstractWeighted finite automata (WFAs) are often used to represent probabilistic models, such as n-gram language models, because among other things, they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a WFA such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization step, both of which can be performed efficiently. We demonstrate the usefulness of our approach on various tasks, including distilling n-gram models from neural models, building compact language models, and building open-vocabulary character models. The algorithms used for these experiments are available in an open-source software library.
- Published
- 2021
- Full Text
- View/download PDF
3. Graph-Based Word Alignment for Clinical Language Evaluation
- Author
-
Emily Prud'hommeaux and Brian Roark
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2022
- Full Text
- View/download PDF
4. Phonotactic Complexity and Its Trade-offs
- Author
-
Tiago Pimentel, Brian Roark, and Ryan Cotterell
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Abstract
AbstractWe present methods for calculating a measure of phonotactic complexity—bits per phoneme— that permits a straightforward cross-linguistic comparison. When given a word, represented as a sequence of phonemic segments such as symbols in the international phonetic alphabet, and a statistical model trained on a sample of word types from the language, we can approximately measure bits per phoneme using the negative log-probability of that word under the model. This simple measure allows us to compare the entropy across languages, giving insight into how complex a language’s phonotactics is. Using a collection of 1016 basic concept words across 106 languages, we demonstrate a very strong negative correlation of − 0.74 between bits per phoneme and the average length of words.
- Published
- 2020
- Full Text
- View/download PDF
5. Neural Models of Text Normalization for Speech Applications
- Author
-
Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, and Brian Roark
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Abstract
Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars. We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process. These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers. Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.
- Published
- 2019
- Full Text
- View/download PDF
6. Probabilistic Top-Down Parsing and Language Modeling
- Author
-
Brian Roark
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2021
- Full Text
- View/download PDF
7. Applications of Lexicographic Semirings to Problems in Speech and Language Processing
- Author
-
Richard Sproat, Mahsa Yarmohammadi, Izhak Shafran, and Brian Roark
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2021
- Full Text
- View/download PDF
8. Finite-State Chart Constraints for Reduced Complexity Context-Free Parsing Pipelines
- Author
-
Brian Roark, Kristy Hollingshead, and Nathan Bodenstab
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2021
- Full Text
- View/download PDF
9. Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler Manny Rayner, Beth Ann Hockey, and Pierette Bouillon (NASA Ames Research Center and University of Geneva) Stanford, CA: CSLI Publications (CSLI studies in computational linguistics, edited by Ann Copestake), 2006, xiv+305 pp; hardbound, ISBN 1-57586-525-4, $65.00; paperbound, ISBN 1-57586-526-2, $25.00
- Author
-
Brian Roark
- Subjects
Computational linguistics. Natural language processing ,P98-98.5 - Published
- 2021
- Full Text
- View/download PDF
10. Huffman scanning: Using language models within fixed-grid keyboard emulation
- Author
-
Brian Roark, Chris Gibbons, Russell Beckley, and Melanie Fried-Oken
- Subjects
Emulation ,Computer science ,Speech recognition ,Binary number ,Huffman coding ,Column (database) ,Article ,Theoretical Computer Science ,Human-Computer Interaction ,symbols.namesake ,Canonical Huffman code ,Asynchronous communication ,Symbol (programming) ,symbols ,Binary code ,Algorithm ,Software - Abstract
Individuals with severe motor impairments commonly enter text using a single binary switch and symbol scanning methods. We present a new scanning method – Huffman scanning – which uses Huffman coding to select the symbols to highlight during scanning, thus minimizing the expected bits per symbol. With our method, the user can select the intended symbol even after switch activation errors. We describe two varieties of Huffman scanning – synchronous and asynchronous – and present experimental results, demonstrating speedups over row/column and linear scanning.
- Published
- 2013
- Full Text
- View/download PDF
11. Speech and Language processing as assistive technologies
- Author
-
Kathleen F. McCoy, Leo Ferres, Brian Roark, Melanie Fried-Oken, and John L. Arnott
- Subjects
Human-Computer Interaction ,Focus (computing) ,Important research ,Work (electrical) ,Multimedia ,Computer science ,Assistive technology ,computer.software_genre ,computer ,Software ,Linguistics ,Theoretical Computer Science ,Variety (cybernetics) - Abstract
We are delighted to bring you this special issue on speech and language processing for assistive technology. It addresses an important research area that is gaining increased recognition from researchers in speech and language processing as a rich and fulfilling area on which to focus their work, and by researchers in assistive technology as the means to dramatically improve communication technologies for individuals with disabilities. This special issue brings a wide swath of approaches and applications highlighting the variety this area offers.
- Published
- 2013
- Full Text
- View/download PDF
12. The Application of Natural Language Processing to Augmentative and Alternative Communication
- Author
-
Gregory W. Lesher, Bryan Moulton, Brian Roark, and D. Jeffery Higginbotham
- Subjects
Computer science ,business.industry ,Rehabilitation ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Physical Therapy, Sports Therapy and Rehabilitation ,computer.software_genre ,Communication Aids for Disabled ,Augmentative and alternative communication ,Assistive technology ,Humans ,ComputingMilieux_COMPUTERSANDSOCIETY ,Artificial intelligence ,Speech Recognition Software ,Interface design ,business ,computer ,Word (computer architecture) ,Natural language processing ,Natural Language Processing - Abstract
Significant progress has been made in the application of natural language processing (NLP) to augmentative and alternative communication (AAC), particularly in the areas of interface design and word prediction. This article will survey the current state-of-the-science of NLP in AAC and discuss its future applications for the development of next generation of AAC technology.
- Published
- 2012
- Full Text
- View/download PDF
13. Discriminative n-gram language modeling
- Author
-
Michael Collins, Brian Roark, and Murat Saraclar
- Subjects
Finite-state machine ,Computer science ,business.industry ,Speech recognition ,Computer Science::Neural and Evolutionary Computation ,Word error rate ,Initialization ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Perceptron ,Theoretical Computer Science ,Human-Computer Interaction ,Reduction (complexity) ,n-gram ,Discriminative model ,Language model ,Artificial intelligence ,business ,Software - Abstract
This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on maximizing the regularized conditional log-likelihood. The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. We describe a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptron's weights; this method gives an additional 0.5% reduction in word error rate (WER) over training with the perceptron alone. The final system achieves a 1.8% absolute reduction in WER for a baseline first-pass recognition system (from 39.2% to 37.4%), and a 0.9% absolute reduction in WER for a multi-pass recognition system (from 28.9% to 28.0%).
- Published
- 2007
- Full Text
- View/download PDF
14. Utterance classification with discriminative language modeling
- Author
-
Murat Saraclar and Brian Roark
- Subjects
Linguistics and Language ,Vocabulary ,business.industry ,Computer science ,Communication ,Speech recognition ,media_common.quotation_subject ,Linear model ,Word error rate ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Pattern recognition ,Perceptron ,Linear discriminant analysis ,Language and Linguistics ,Computer Science Applications ,Discriminative model ,Modeling and Simulation ,Computer Vision and Pattern Recognition ,Language model ,Artificial intelligence ,business ,Software ,Utterance ,media_common - Abstract
This paper investigates discriminative language modeling in a scenario with two kinds of observed errors: errors in ASR transcription and errors in utterance classification. We train joint language and class models either independently or simultaneously, under various parameter update conditions. On a large vocabulary customer service call-classification application, we show that simultaneous optimization of class, n-gram, and class/n-gram feature weights results in a significant WER reduction over a model using just n-gram features, while additionally significantly outperforming a deployed baseline in classification error rate. A range of parameter estimation approaches, based on either the perceptron algorithm or conditional log-linear models, for various feature sets are presented and evaluated. The resulting models are encoded as weighted finite-state automata, and are used by intersecting the model with word lattices.
- Published
- 2006
- Full Text
- View/download PDF
15. MAP adaptation of stochastic grammars
- Author
-
Brian Roark, Michael Riley, Richard Sproat, and Michiel Bacchiani
- Subjects
Domain adaptation ,Parsing ,business.industry ,Computer science ,Speech recognition ,Probabilistic logic ,computer.software_genre ,Machine learning ,Adaptation strategies ,Theoretical Computer Science ,Human-Computer Interaction ,ComputingMethodologies_PATTERNRECOGNITION ,Rule-based machine translation ,Map adaptation ,Maximum a posteriori estimation ,Artificial intelligence ,Language model ,business ,computer ,Software - Abstract
This paper investigates supervised and unsupervised adaptation of stochastic grammars, including n-gram language models and probabilistic context-free grammars (PCFGs), to a new domain. It is shown that the commonly used approaches of count merging and model interpolation are special cases of a more general maximum a posteriori (MAP) framework, which additionally allows for alternate adaptation approaches. This paper investigates the effectiveness of different adaptation strategies, and, in particular, focuses on the need for supervision in the adaptation process. We show that n-gram models as well as PCFGs benefit from either supervised or unsupervised MAP adaptation in various tasks. For n-gram models, we compare the benefit from supervised adaptation with that of unsupervised adaptation on a speech recognition task with an adaptation sample of limited size (about 17h), and show that unsupervised adaptation can obtain 51% of the 7.7% adaptation gain obtained by supervised adaptation. We also investigate the benefit of using multiple word hypotheses (in the form of a word lattice) for unsupervised adaptation on a speech recognition task for which there was a much larger adaptation sample available. The use of word lattices for adaptation required the derivation of a generalization of the well-known Good-Turing estimate. Using this generalization, we derive a method that uses Monte Carlo sampling for building Katz backoff models. The adaptation results show that, for adaptation samples of limited size (several tens of hours), unsupervised adaptation on lattices gives a performance gain over using transcripts. The experimental results also show that with a very large adaptation sample (1050h), the benefit from transcript-based adaptation matches that of lattice-based adaptation. Finally, we show that PCFG domain adaptation using the MAP framework provides similar gains in F-measure accuracy on a parsing task as was seen in ASR accuracy improvements with n-gram adaptation. Experimental results show that unsupervised adaptation provides 37% of the 10.35% gain obtained by supervised adaptation.
- Published
- 2006
- Full Text
- View/download PDF
16. THE DESIGN PRINCIPLES AND ALGORITHMS OF A WEIGHTED GRAMMAR LIBRARY
- Author
-
Cyril Allauzen, Mehryar Mohri, and Brian Roark
- Subjects
Theoretical computer science ,Grammar ,Programming language ,Computer science ,media_common.quotation_subject ,Biosequence ,Design elements and principles ,computer.software_genre ,Variety (linguistics) ,Automaton ,Rule-based machine translation ,Computer Science (miscellaneous) ,Software design ,Representation (mathematics) ,Algorithm ,computer ,media_common - Abstract
We present the software design principles, algorithms, and utilities of a general weighted grammar library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. Several of the algorithms and utilities of this library are described, including in some cases their pseudocodes and pointers to their use in applications. The algorithms and the utilities were designed to support a wide variety of semirings and the representation and use of large grammars and automata of several hundred million rules or transitions.
- Published
- 2005
- Full Text
- View/download PDF
17. Robust garden path parsing
- Author
-
Brian Roark
- Subjects
Linguistics and Language ,Parsing ,business.industry ,Computer science ,Probabilistic logic ,Recursive descent parser ,computer.software_genre ,Top-down parsing ,Language and Linguistics ,Canonical LR parser ,Simple LR parser ,Parser combinator ,Artificial Intelligence ,GLR parser ,Artificial intelligence ,business ,computer ,Software ,Natural language processing - Abstract
This paper presents modifications to a standard probabilistic context-free grammar that enable a predictive parser to avoid garden pathing without resorting to any ad-hoc heuristic repair. The resulting parser is shown to apply efficiently to both newspaper text and telephone conversations with complete coverage and excellent accuracy. The distribution over trees is peaked enough to allow the parser to find parses efficiently, even with the much larger search space resulting from overgeneration. Empirical results are provided for both Wall St. Journal and Switchboard test corpora.
- Published
- 2004
- Full Text
- View/download PDF
18. Offline analysis of context contribution to ERP-based typing BCI performance
- Author
-
Brian Roark, Barry Oken, Melanie Fried-Oken, Umut Orhan, and Deniz Erdogmus
- Subjects
Male ,medicine.diagnostic_test ,Computer science ,Speech recognition ,Biomedical Engineering ,Electroencephalography ,Linear discriminant analysis ,Article ,Cellular and Molecular Neuroscience ,Discriminant ,Event-related potential ,Brain-Computer Interfaces ,medicine ,Humans ,Female ,Typing ,Language model ,Symbol rate ,Evoked Potentials ,Photic Stimulation ,Brain–computer interface - Abstract
Objective. We aim to increase the symbol rate of electroencephalography (EEG) based brain–computer interface (BCI) typing systems by utilizing context information. Approach. Event related potentials (ERP) corresponding to a stimulus in EEG can be used to detect the intended target of a person for BCI. This paradigm is widely utilized to build letter-by-letter BCI typing systems. Nevertheless currently available BCI typing systems still require improvement due to low typing speeds. This is mainly due to the reliance on multiple repetitions before making a decision to achieve higher typing accuracy. Another possible approach to increase the speed of typing while not significantly reducing the accuracy of typing is to use additional context information. In this paper, we study the effect of using a language model (LM) as additional evidence for intent detection. Bayesian fusion of an n-gram symbol model with EEG features is proposed, and a specifically regularized discriminant analysis ERP discriminant is used to obtain EEG-based features. The target detection accuracies are rigorously evaluated for varying LM orders, as well as the number of ERP-inducing repetitions. Main results. The results demonstrate that the LMs contribute significantly to letter classification accuracy. For instance, we find that a single-trial ERP detection supported by a 4-gram LM may achieve the same performance as using 3-trial ERP classification for the non-initial letters of words. Significance. Overall, the fusion of evidence from EEG and LMs yields a significant opportunity to increase the symbol rate of a BCI typing system.
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.