Back to Search Start Over

Frame-by-frame language identification in short utterances using deep neural networks.

Authors :
Gonzalez-Dominguez, Javier
Lopez-Moreno, Ignacio
Moreno, Pedro J.
Gonzalez-Rodriguez, Joaquin
Source :
Neural Networks. Apr2015, Vol. 64, p49-58. 10p.
Publication Year :
2015

Abstract

This work addresses the use of deep neural networks (DNNs) in automatic language identification (LID) focused on short test utterances. Motivated by their recent success in acoustic modelling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from the short-term acoustic features. We show how DNNs are particularly suitable to perform LID in real-time applications, due to their capacity to emit a language identification posterior at each new frame of the test utterance. We then analyse different aspects of the system, such as the amount of required training data, the number of hidden layers, the relevance of contextual information and the effect of the test utterance duration. Finally, we propose several methods to combine frame-by-frame posteriors. Experiments are conducted on two different datasets: the public NIST Language Recognition Evaluation 2009 (3 s task) and a much larger corpus (of 5 million utterances) known as Google 5M LID, obtained from different Google Services. Reported results show relative improvements of DNNs versus the i -vector system of 40% in LRE09 3 second task and 76% in Google 5M LID. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08936080
Volume :
64
Database :
Academic Search Index
Journal :
Neural Networks
Publication Type :
Academic Journal
Accession number :
100797575
Full Text :
https://doi.org/10.1016/j.neunet.2014.08.006