Back to Search Start Over

Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection

Authors :
Li, Kai
Li, Sheng
Lu, Xugang
Akagi, Masato
Liu, Meng
Zhang, Lin
Zeng, Chang
Wang, Longbiao
Dang, Jianwu
Unoki, Masashi
Li, Kai
Li, Sheng
Lu, Xugang
Akagi, Masato
Liu, Meng
Zhang, Lin
Zeng, Chang
Wang, Longbiao
Dang, Jianwu
Unoki, Masashi
Publication Year :
2022

Abstract

Fake audio detection (FAD) is a technique to distinguish synthetic speech from natural speech. In most FAD systems, removing irrelevant features from acoustic speech while keeping only robust discriminative features is essential. Intuitively, speaker information entangled in acoustic speech should be suppressed for the FAD task. Particularly in a deep neural network (DNN)-based FAD system, the learning system may learn speaker information from a training dataset and cannot generalize well on a testing dataset. In this paper, we propose to use the speaker anonymization (SA) technique to suppress speaker information from acoustic speech before inputting it into a DNN-based FAD system. We adopted the McAdamscoefficient-based SA (MC-SA) algorithm, and this is expected that the entangled speaker information will not be involved in the DNN-based FAD learning. Based on this idea, we implemented a light convolutional neural network bidirectional long short-term memory (LCNN-BLSTM)-based FAD system and conducted experiments on the Audio Deep Synthesis Detection Challenge (ADD2022) datasets. The results showed that removing the speaker information from acoustic speech improved the relative performance in the first track of ADD2022 by 17.66%.<br />Interspeech 2022, 18-22 September 2022, Incheon, Korea<br />identifier:https://dspace.jaist.ac.jp/dspace/handle/10119/18158

Details

Database :
OAIster
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1409776030
Document Type :
Electronic Resource