Start Over

Acoustical and perceptual study of voice disguise by age modification in speaker verification

Authors :: Rosa González Hautamäki
Tomi Kinnunen
Sahidullah
Ville Hautamäki
School of Computing, activities
Source :: Speech Communication. 95:1-15
Publication Year :: 2017
Publisher :: Elsevier BV, 2017.
Abstract: The task of speaker recognition is feasible when the speakers are co-operative or wish to be recognized. While modern automatic speaker verification (ASV) systems and some listeners are good at recognizing speakers from modal, unmodified speech, the task becomes notoriously difficult in situations of deliberate voice disguise when the speaker aims at masking his or her identity. We approach voice disguise from the perspective of acoustical and perceptual analysis using a self-collected corpus of 60 native Finnish speakers (31 female, 29 male) producing utterances in normal, intended young and intended old voice modes. The normal voices form a starting point and we are interested in studying how the two disguise modes impact the acoustical parameters and perceptual speaker similarity judgments. First, we study the effect of disguise as a relative change in fundamental frequency (F0) and formant frequencies (F1 to F4) from modal to disguised utterances. Next, we investigate whether or not speaker comparisons that are deemed easy or difficult by a modern ASV system have a similar difficulty level for the human listeners. Further, we study affecting factors from listener-related self-reported information that may explain a particular listener’s success or failure in speaker similarity assessment. Our acoustic analysis reveals a systematic increase in relative change in mean F0 for the intended young voices while for the intended old voices, the relative change is less prominent in most cases. Concerning the formants F1 through F4, 29% (for male) and 30% (for female) of the utterances did not exhibit a significant change in any formant value, while the remaining ∼ 70% of utterances had significant changes in at least one formant. Our listening panel consists of 70 listeners, 32 native and 38 non-native, who listened to 24 utterance pairs selected using rankings produced by an ASV system. The results indicate that speaker pairs categorized as easy by our ASV system were also easy for the average listener. Similarly, the listeners made more errors in the difficult trials. The listening results indicate that target (same speaker) trials were more difficult for the non-native group, while the performance for the non-target pairs was similar for both native and non-native groups.<br />final draft<br />peerReviewed

Details

ISSN :: 01676393
Volume :: 95
Database :: OpenAIRE
Journal :: Speech Communication
Accession number :: edsair.doi.dedup.....fc3db38f4d35b382960a43874112b337
Full Text :: https://doi.org/10.1016/j.specom.2017.10.002

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Acoustical and perceptual study of voice disguise by age modification in speaker verification

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Acoustical and perceptual study of voice disguise by age modification in speaker verification

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources