Back to Search Start Over

Role of Data Augmentation and Effective Conservation of High-Frequency Contents in the Context Children's Speaker Verification System.

Authors :
Aziz, Shahid
Shahnawazuddin, S.
Source :
Circuits, Systems & Signal Processing. May2024, Vol. 43 Issue 5, p3139-3159. 21p.
Publication Year :
2024

Abstract

Developing an automatic speaker verification (ASV) system for children's speech presents significant challenges. One major obstacle is the scarcity of domain-specific data. This issue is exacerbated when dealing with short speech utterances, a relatively unexplored area in children's ASV. Voice biometric systems struggle during enrollment and verification phase, when faced with inadequate speech data, both in volume as well as in duration. To address data scarcity, this paper explores various in-domain and out-of-domain data augmentation techniques. Out-of-domain data from adult speakers, which have distinct acoustic attributes from children, are modified using techniques like voice-conversion, prosody and formant modification to make them acoustically similar to children's speech. In-domain data augmentation involves perturbing the speed of children's speech. This combined data augmentation approach not only increases training data volume but also captures missing target attributes, resulting in a significant 43.91% reduction in equal error rate (EER) compared to the baseline system. Additionally, the paper addresses the challenge of preserving higher-frequency components in children's speech. It achieves this by concatenating conventional Mel-frequency cepstral coefficients (MFCC) with Inverse-Mel-frequency cepstral coefficient (IMFCC) features at the frame level. The low canonical correlation between MFCC and IMFCC feature vectors motivates this fusion. The feature concatenation approach, when combined with proposed data augmentation, results in an appreciable reduction of 48.51% in the overall EER, demonstrating its effectiveness in improving the performance of children's ASV system. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0278081X
Volume :
43
Issue :
5
Database :
Academic Search Index
Journal :
Circuits, Systems & Signal Processing
Publication Type :
Academic Journal
Accession number :
176340056
Full Text :
https://doi.org/10.1007/s00034-024-02598-1