Back to Search Start Over

Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks.

Authors :
Cai, Zexin
Li, Ming
Source :
Computer Speech & Language. Apr2024, Vol. 85, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Partially fake audio, a variant of deep fake that involves manipulating audio utterances through the incorporation of fake or externally-sourced bona fide audio clips, constitutes a growing threat as an audio forgery attack impacting both human and artificial intelligence applications. Researchers have recently developed valuable databases to aid in the development of effective countermeasures against such attacks. While existing countermeasures mainly focus on identifying partially fake audio at the level of entire utterances or segments, this paper introduces a paradigm shift by proposing frame-level systems. These systems are designed to detect manipulated utterances and pinpoint the specific regions within partially fake audio where the manipulation occurs. Our approach leverages acoustic features extracted from large-scale self-supervised pre-training models, delivering promising results evaluated on diverse, publicly accessible databases. Additionally, we study the integration of boundary and deepfake detection systems, exploring their potential synergies and shortcomings. Importantly, our techniques have yielded impressive results. We have achieved state-of-the-art performance on the test dataset of the Track 2 of ADD 2022 challenge with an equal error rate of 4.4%. Furthermore, our methods exhibit remarkable performance in locating manipulated regions in Track 2 of the ADD 2023 challenge, resulting in a final ADD score of 0.6713 and securing the top position. • Simultaneously detecting partially spoofed audio and locating the manipulated regions. • Exploring model integration techniques in identifying fake regions. • Investigating self-supervised pre-training models in partially spoofing detection. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08852308
Volume :
85
Database :
Academic Search Index
Journal :
Computer Speech & Language
Publication Type :
Academic Journal
Accession number :
174528134
Full Text :
https://doi.org/10.1016/j.csl.2023.101597