Back to Search Start Over

Exploring Native and Non-Native English Child Speech Recognition With Whisper

Authors :
Rishabh Jain
Andrei Barcovschi
Mariam Yahayah Yiwere
Peter Corcoran
Horia Cucu
Source :
IEEE Access, Vol 12, Pp 41601-41610 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

Modern end-to-end Automatic Speech Recognition (ASR) systems struggle to recognise children’s speech. This challenge is due to the high acoustic variability in children’s voices and the scarcity of child speech training data, particularly for accented or low-resource languages. This study focuses on improving the performance of ASR on native and non-native English child speech using publicly available datasets. We evaluate how the large-scale whisper models (trained with a large amount of adult speech data) perform with child speech. In addition, we perform finetuning experiments using different child speech datasets to investigate the performance of whisper ASR on non-native English-speaking children’s speech. Our findings indicate relative Word Error Rate (WER) improvements ranging from 29% to 89% over previous benchmarks on the same datasets. Notably, these gains were achieved by finetuning with only a 10% sample of unseen non-native datasets. These results demonstrate the potential of whisper for improving ASR in a low-resource scenario for non-native child speech.

Details

Language :
English
ISSN :
21693536
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.0511c1c47cad47ab8ad278b14388fd9e
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3378738