Back to Search Start Over

Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system

Authors :
Richard Schwartz
Bing Xiang
Jeff Z. Ma
Fabrice Lefèvre
Lori Lamel
Rohit Prasad
Jean-Luc Gauvain
Spyros Matsoukas
Holger Schwenk
Chia-Lin Kao
Owen Kimball
Thomas Colthurst
Long Nguyen
John Makhoul
Gilles Adda
Déposants HAL-Avignon, bibliothèque Universitaire
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI)
Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919)
Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)
Institut des Technologies Multilingues et Multimédias de l'Information (IMMI)
Centre National de la Recherche Scientifique (CNRS)
Laboratoire Informatique d'Avignon (LIA)
Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
Department of Mathematics, Brown University
Brown University
Laboratoire d'Informatique de l'Université du Mans (LIUM)
Le Mans Université (UM)
Source :
IEEE Transactions on Audio, Speech and Language Processing, IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2006
Publication Year :
2006
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2006.

Abstract

This paper describes the progress made in the transcription of broadcast news (BN) and conversational telephone speech (CTS) within the combined BBN/LIMSI system from May 2002 to September 2004. During that period, BBN and LIMSI collaborated in an effort to produce significant reductions in the word error rate (WER), as directed by the aggressive goals of the Effective, Affordable, Reusable, Speech-to-text [Defense Advanced Research Projects Agency (DARPA) EARS] program. The paper focuses on general modeling techniques that led to recognition accuracy improvements, as well as engineering approaches that enabled efficient use of large amounts of training data and fast decoding architectures. Special attention is given on efforts to integrate components of the BBN and LIMSI systems, discussing the tradeoff between speed and accuracy for various system combination strategies. Results on the EARS progress test sets show that the combined BBN/LIMSI system achieved relative reductions of 47% and 51% on the BN and CTS domains, respectively

Details

ISSN :
15587916
Volume :
14
Database :
OpenAIRE
Journal :
IEEE Transactions on Audio, Speech and Language Processing
Accession number :
edsair.doi.dedup.....7290b91cb5bb690beb1ed1eca6943d5b
Full Text :
https://doi.org/10.1109/tasl.2006.878257