Back to Search
Start Over
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- Publication Year :
- 2020
- Publisher :
- arXiv, 2020.
-
Abstract
- We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data.
- Subjects :
- FOS: Computer and information sciences
Computer Science - Machine Learning
Sound (cs.SD)
Computer Science - Computation and Language
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Computation and Language (cs.CL)
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
Machine Learning (cs.LG)
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....3f81c36114c918f2c04e027d78358bbd
- Full Text :
- https://doi.org/10.48550/arxiv.2006.11477