Author: "Suyoun Kim" / Publisher: arxiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Suyoun Kim"' showing total 4 results

Start Over Author "Suyoun Kim" Publisher arxiv

4 results on '"Suyoun Kim"'

1. Deliberation Model for On-Device Spoken Language Understanding

Author: Duc Le, Akshat Shrivastava, Paden D. Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, and Michael Seltzer
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings. By formulating E2E SLU as a generalized decoder, our system is able to support complex compositional semantic structures. Furthermore, the sharing of parameters between ASR and NLU makes the system especially suitable for resource-constrained (on-device) environments; our proposed approach consistently outperforms strong pipeline NLU baselines by 0.60% to 0.65% on the spoken version of the TOPv2 dataset (STOP). We demonstrate that the fusion of text and audio features, coupled with the system's ability to rewrite the first-pass hypothesis, makes our approach more robust to ASR errors. Finally, we show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training, but more work is required to make text-to-speech (TTS) a viable solution for scaling up E2E SLU., Comment: Accepted for publication at INTERSPEECH 2022
Published: 2022
Full Text: View/download PDF

2. Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

Author: Yatharth Saraf, Michael L. Seltzer, Suyoun Kim, Christian Fuegen, Duc Le, Yangyang Shi, Ozlem Kalinli, Julian Chan, Gil Keren, Yuan Shangguan, Mahaveer Jain, and Jay Mahadeokar
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Artificial neural network, Computer science, business.industry, Speech recognition, Word error rate, Modular design, Machine Learning (cs.LG), End-to-end principle, Audio and Speech Processing (eess.AS), Trie, FOS: Electrical engineering, electronic engineering, information engineering, Leverage (statistics), Use case, Language model, business, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: How to leverage dynamic contextual information in end-to-end speech recognition has remained an active research area. Previous solutions to this problem were either designed for specialized use cases that did not generalize well to open-domain scenarios, did not scale to large biasing lists, or underperformed on rare long-tail words. We address these limitations by proposing a novel solution that combines shallow fusion, trie-based deep biasing, and neural network language model contextualization. These techniques result in significant 19.5% relative Word Error Rate improvement over existing contextual biasing approaches and 5.4%-9.3% improvement compared to a strong hybrid baseline on both open-domain and constrained contextualization tasks, where the targets consist of mostly rare long-tail words. Our final system remains lightweight and modular, allowing for quick modification without model re-training., Comment: Accepted for presentation at INTERSPEECH 2021
Published: 2021
Full Text: View/download PDF

3. Dialog-context aware end-to-end speech recognition

Author: Florian Metze and Suyoun Kim
Subjects: FOS: Computer and information sciences, Context model, Computer Science - Computation and Language, Computer science, Speech recognition, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Data modeling, End-to-end principle, 0202 electrical engineering, electronic engineering, information engineering, Natural (music), Dialog box, Computation and Language (cs.CL), Sentence, Decoding methods, 0105 earth and related environmental sciences
Abstract: Existing speech recognition systems are typically built at the sentence level, although it is known that dialog context, e.g. higher-level knowledge that spans across sentences or speakers, can help the processing of long conversations. The recent progress in end-to-end speech recognition systems promises to integrate all available information (e.g. acoustic, language resources) into a single model, which is then jointly optimized. It seems natural that such dialog context information should thus also be integrated into the end-to-end models to improve further recognition accuracy. In this work, we present a dialog-context aware speech recognition model, which explicitly uses context information beyond sentence-level information, in an end-to-end fashion. Our dialog-context model captures a history of sentence-level context so that the whole system can be trained with dialog-context information in an end-to-end manner. We evaluate our proposed approach on the Switchboard conversational speech corpus and show that our system outperforms a comparable sentence-level end-to-end speech recognition system., Comment: submitted to SLT
Published: 2018
Full Text: View/download PDF

4. Improved training for online end-to-end speech recognition systems

Author: Michael L. Seltzer, Rui Zhao, Suyoun Kim, and Jinyu Li
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computer science, Speech recognition, Initialization, Word error rate, Pronunciation, Lexicon, Task (project management), 030507 speech-language pathology & audiology, 03 medical and health sciences, End-to-end principle, 0305 other medical science, Computation and Language (cs.CL), Smoothing
Abstract: Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training. Otherwise, the networks may fail to find a good local optimum. This is particularly true for online networks, such as unidirectional LSTMs. Currently, the best strategy to train such systems is to bootstrap the training from a tied-triphone system. However, this is time consuming, and more importantly, is impossible for languages without a high-quality pronunciation lexicon. In this work, we propose an initialization strategy that uses teacher-student learning to transfer knowledge from a large, well-trained, offline end-to-end speech recognition model to an online end-to-end model, eliminating the need for a lexicon or any other linguistic resources. We also explore curriculum learning and label smoothing and show how they can be combined with the proposed teacher-student learning for further improvements. We evaluate our methods on a Microsoft Cortana personal assistant task and show that the proposed method results in a 19 % relative improvement in word error rate compared to a randomly-initialized baseline system., Comment: Interspeech 2018
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

4 results on '"Suyoun Kim"'

1. Deliberation Model for On-Device Spoken Language Understanding

2. Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

3. Dialog-context aware end-to-end speech recognition

4. Improved training for online end-to-end speech recognition systems

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

4 results on '"Suyoun Kim"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources