Back to Search
Start Over
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
- Publication Year :
- 2023
- Publisher :
- arXiv, 2023.
-
Abstract
- Current ASR systems are mainly trained and evaluated at the utterance level. Long range cross utterance context can be incorporated. A key task is to derive a suitable compact representation of the most relevant history contexts. In contrast to previous researches based on either LSTM-RNN encoded histories that attenuate the information from longer range contexts, or frame level concatenation of transformer context embeddings, in this paper compact low-dimensional cross utterance contextual features are learned in the Conformer-Transducer Encoder using specially designed attention pooling layers that are applied over efficiently cached preceding utterances history vectors. Experiments on the 1000-hr Gigaspeech corpus demonstrate that the proposed contextualized streaming Conformer-Transducers outperform the baseline using utterance internal context only with statistically significant WER reductions of 0.7% to 0.5% absolute (4.3% to 3.1% relative) on the dev and test data.<br />Comment: Accepted by INTERSPEECH 2023
- Subjects :
- FOS: Computer and information sciences
Computer Science - Computation and Language
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Computation and Language (cs.CL)
Electrical Engineering and Systems Science - Audio and Speech Processing
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....dfa7b03651e510fe17d526614c7701d1
- Full Text :
- https://doi.org/10.48550/arxiv.2306.13307