1. The Zero Resource Speech Challenge 2021: Spoken Language Modelling
- Author
-
Ewan Dunbar, Emmanuel Dupoux, Mathieu Bernard, Patricia Rozé, Maureen de Seyssel, Morgane Riviere, Tu Anh Nguyen, Nicolas Hamilakis, Eugene Kharitonov, University of Toronto, Laboratoire de sciences cognitives et psycholinguistique (LSCP), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), Apprentissage machine et développement cognitif (CoML), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Département d'Etudes Cognitives - ENS Paris (DEC), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Facebook AI Research [Paris] (FAIR), Facebook, ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), ANR-17-EURE-0017,FrontCog,Frontières en cognition(2017), ANR-10-IDEX-0001,PSL,Paris Sciences et Lettres(2010), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de sciences cognitives et psycholinguistique (LSCP), École normale supérieure - Paris (ENS-PSL), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL)
- Subjects
FOS: Computer and information sciences ,Computer Science - Artificial Intelligence ,Computer science ,02 engineering and technology ,computer.software_genre ,Zero-resource ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Standard language ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Cognitive benchmarks ,Computer Science - Computation and Language ,business.industry ,020206 networking & telecommunications ,computer.file_format ,Pipeline (software) ,Unsupervised speech ,Zero (linguistics) ,Artificial Intelligence (cs.AI) ,Language modelling ,Lowresource ,Language model ,Artificial intelligence ,ABX test ,0305 other medical science ,business ,Computation and Language (cs.CL) ,computer ,Encoder ,Natural language processing ,Spoken language - Abstract
We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (CPC), a quantizer ($k$-means) and a standard language model (BERT or LSTM). The metrics evaluate the learned representations at the acoustic (ABX discrimination), lexical (spot-the-word), syntactic (acceptability judgment) and semantic levels (similarity judgment). We present an overview of the eight submitted systems from four groups and discuss the main results., Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588
- Published
- 2021
- Full Text
- View/download PDF