Back to Search Start Over

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

Authors :
Robin Algayres
Adel Nabli
Benoît Sagot
Emmanuel Dupoux
Apprentissage machine et développement cognitif (CoML)
Laboratoire de sciences cognitives et psycholinguistique (LSCP)
Département d'Etudes Cognitives - ENS Paris (DEC)
École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Département d'Etudes Cognitives - ENS Paris (DEC)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH)
Inria de Paris
Facebook AI Research [Paris] (FAIR)
Facebook
Université Paris sciences et lettres (PSL)
École des hautes études en sciences sociales (EHESS)
This work was funded in part by facebook AI Research (Research Grant). This work was performed using HPC resources from GENCI-IDRIS (Grant 2021-[AD011011217])
ANR-17-EURE-0017,FrontCog,Frontières en cognition(2017)
ANR-10-IDEX-0001,PSL,Paris Sciences et Lettres(2010)
ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019)
Source :
Interspeech 2022-23rd INTERSPEECH Conference, Interspeech 2022-23rd INTERSPEECH Conference, Sep 2022, Incheon, South Korea
Publication Year :
2022
Publisher :
HAL CCSD, 2022.

Abstract

International audience; We introduce a simple neural encoder architecture that can be trained using an unsupervised contrastive learning objective which gets its positive samples from data-augmented k-Nearest Neighbors search. We show that when built on top of recent self-supervised audio representations [1, 2, 3], this method can be applied iteratively and yield competitive SSE as evaluated on two tasks: query-by-example of random sequences of speech, and spoken term discovery. On both tasks our method pushes the state-of-the-art by a significant margin across 5 different languages. Finally, we establish a benchmark on a query-byexample task on the LibriSpeech dataset to monitor future improvements in the field.

Details

Language :
English
Database :
OpenAIRE
Journal :
Interspeech 2022-23rd INTERSPEECH Conference, Interspeech 2022-23rd INTERSPEECH Conference, Sep 2022, Incheon, South Korea
Accession number :
edsair.doi.dedup.....3260337894b9f00327e592d347f92b52