1. Privacy and utility of x-vector based speaker anonymization
- Author
-
Brij Mohan Lal Srivastava, Mohamed Maouche, Md Sahidullah, Emmanuel Vincent, Aurelien Bellet, Marc Tommasi, Natalia Tomashenko, Xin Wang, Junichi Yamagishi, Machine Learning in Information Networks (MAGNET), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, National Institute of Informatics (NII), Grid'5000, ANR-18-CE23-0018,DEEP-PRIVACY,Apprentissage distribué, personnalisé, préservant la privacité pour le traitement de la parole(2018), ANR-19-DATA-0008,Harpocrates,Open data, outils et challenges pour l'anonymisation des voix(2019), European Project: 825081,H2020,COMPRISE(2018), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Centre d'Enseignement et de Recherche en Informatique - CERI-Avignon Université (AU), Machine Learning in Information Networks [MAGNET], Speech Modeling for Facilitating Oral-Based Communication [MULTISPEECH], Laboratoire Informatique d'Avignon [LIA], National Institute of Informatics [NII], Srivastava, Brij Mohan Lal, APPEL À PROJETS GÉNÉRIQUE 2018 - Apprentissage distribué, personnalisé, préservant la privacité pour le traitement de la parole - - DEEP-PRIVACY2018 - ANR-18-CE23-0018 - AAPG2018 - VALID, Open data, outils et challenges pour l'anonymisation des voix - - Harpocrates2019 - ANR-19-DATA-0008 - DONNEES - VALID, and Cost effective, Multilingual, Privacy-driven voice-enabled Services - COMPRISE - - H20202018-12-01 - 2021-11-30 - 825081 - VALID
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,speaker anonymization ,speaker identification ,Acoustics and Ultrasonics ,speech recognition ,privacy ,linkability ,utility ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Computational Mathematics ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,Computer Science (miscellaneous) ,Electrical and Electronic Engineering - Abstract
International audience; We study the scenario where individuals (speakers) contribute to the publication of an anonymized speech corpus. Data users then leverage this public corpus to perform downstream tasks (such as training automatic speech recognition systems), while attackers may try to de-anonymize itbased on auxiliary knowledge they collect. Motivated by this scenario, speaker anonymization aims to conceal the speaker identity while preserving the quality and usefulness of speech data. In this paper, we study x-vector based speaker anonymization, the leading approach in the recent Voice Privacy Challenge, which converts an input utterance into that of a random pseudo-speaker. We show that the strength of the anonymization varies significantly depending on how the pseudo-speaker is selected. In particular, we investigate four design choices: the distance measure between speakers, the region of x-vector space where the pseudo-speaker is mapped, the gender selection and whether to use speaker or utterance level assignment. We assess the quality of anonymization from the perspective of the three actors involved in our threat model, namely the speaker, the user and the attacker. To measure privacy and utility, we use respectively the linkability score achieved by the attackers and the decoding word error rate incurred by an ASR model trained with the anonymized data. Experiments on LibriSpeech dataset confirm that the optimal combination ofdesign choices yield state-of-the-art performance in terms of privacy protection as well as utility. Experiments on Mozilla Common Voice dataset show that the best design choices with 50 speakers guarantee the same anonymization level against re-identification attack as raw speech with 20,000 speakers.
- Published
- 2022
- Full Text
- View/download PDF