Back to Search Start Over

Do self-supervised speech models develop human-like perception biases?

Authors :
Millet, Juliette
Dunbar, Ewan
Apprentissage machine et développement cognitif (CoML)
Laboratoire de sciences cognitives et psycholinguistique (LSCP)
Département d'Etudes Cognitives - ENS Paris (DEC)
École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Département d'Etudes Cognitives - ENS Paris (DEC)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Inria de Paris
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Laboratoire de Linguistique Formelle (LLF - UMR7110)
Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité)
ANR-17-CE28-0009,GEOMPHON,Perception et apprentissage de la parole dans la typologie géometrique des inventaires phonologiques(2017)
ANR-11-IDFI-0023,IIFR,Institut Innovant de Formation par la Recherche(2011)
ANR-11-IDEX-0005,USPC,Université Sorbonne Paris Cité(2011)
ANR-10-LABX-0083,EFL,Empirical Foundations of Linguistics : data, methods, models(2010)
ANR-17-EURE-0017,FrontCog,Frontières en cognition(2017)
ANR-10-IDEX-0001,PSL,Paris Sciences et Lettres(2010)
ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019)
Source :
Proceedings ACL 2022, ACL 2022-60th Annual Meeting of the Association for Computational Linguistics, ACL 2022-60th Annual Meeting of the Association for Computational Linguistics, May 2022, Dublin, Ireland. pp.7591-7605, ⟨10.18653/v1/2022.acl-long.523⟩
Publication Year :
2022
Publisher :
HAL CCSD, 2022.

Abstract

International audience; Self-supervised models for speech processing form representational spaces without using any external labels. Increasingly, they appear to be a feasible way of at least partially eliminating costly manual annotations, a problem of particular concern for low-resource languages. But what kind of representational spaces do these models construct? Human perception specializes to the sounds of listeners' native languages. Does the same thing happen in self-supervised models? We examine the representational spaces of three kinds of stateof-the-art self-supervised models: wav2vec 2.0, HuBERT and contrastive predictive coding (CPC), and compare them with the perceptual spaces of French-speaking and Englishspeaking human listeners, both globally and taking account of the behavioural differences between the two language groups. We show that the CPC model shows a small native language effect, but that wav2vec 2.0 and Hu-BERT seem to develop a universal speech perception space which is not language specific. A comparison against the predictions of supervised phone recognisers suggests that all three self-supervised models capture relatively finegrained perceptual phenomena, while supervised models are better at capturing coarser, phone-level, effects of listeners' native language, on perception.

Details

Language :
English
Database :
OpenAIRE
Journal :
Proceedings ACL 2022, ACL 2022-60th Annual Meeting of the Association for Computational Linguistics, ACL 2022-60th Annual Meeting of the Association for Computational Linguistics, May 2022, Dublin, Ireland. pp.7591-7605, ⟨10.18653/v1/2022.acl-long.523⟩
Accession number :
edsair.doi.dedup.....dbe36c6fb7a3706d766e8572ac371282