Start Over

Language independent end-to-end architecture for joint language identification and speech recognition

Authors :: John R. Hershey
Shinji Watanabe
Takaaki Hori
Source :: ASRU
Publication Year :: 2017
Publisher :: IEEE, 2017.
Abstract: End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. This also creates an opportunity, which we fully exploit in this paper, to build a monolithic multilingual ASR system with a language-independent neural network architecture. We present a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition. The model is based on our hybrid attention/connectionist temporal classification (CTC) architecture which has previously been shown to achieve the state-of-the-art performance in several ASR benchmarks. Here we augment its set of output symbols to include the union of character sets appearing in all the target languages. These include Roman and Cyrillic Alphabets, Arabic numbers, simplified Chinese, and Japanese Kanji/Hiragana/Katakana characters (5,500 characters in all). This allows training of a single multilingual model, whose parameters are shared across all the languages. The model can jointly identify the language and recognize the speech, automatically formatting the recognized text in the appropriate character set. The experiments, which used speech databases composed of Wall Street Journal (English), Corpus of Spontaneous Japanese, HKUST Mandarin CTS, and Voxforge (German, Spanish, French, Italian, Dutch, Portuguese, Russian), demonstrate comparable/superior performance relative to language-dependent end-to-end ASR systems.

Subjects :: Kanji
Language identification
Rule-based machine translation
Computer science
Character (computing)
Katakana
Speech recognition
Character encoding
VoxForge
Hiragana

Details

Database :: OpenAIRE
Journal :: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
Accession number :: edsair.doi...........4f897adff74bedbb6110ba73ce63895c
Full Text :: https://doi.org/10.1109/asru.2017.8268945

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Language independent end-to-end architecture for joint language identification and speech recognition

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Language independent end-to-end architecture for joint language identification and speech recognition

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources