1. Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system
- Author
-
Richard Schwartz, Bing Xiang, Jeff Z. Ma, Fabrice Lefèvre, Lori Lamel, Rohit Prasad, Jean-Luc Gauvain, Spyros Matsoukas, Holger Schwenk, Chia-Lin Kao, Owen Kimball, Thomas Colthurst, Long Nguyen, John Makhoul, Gilles Adda, Déposants HAL-Avignon, bibliothèque Universitaire, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Institut des Technologies Multilingues et Multimédias de l'Information (IMMI), Centre National de la Recherche Scientifique (CNRS), Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Department of Mathematics, Brown University, Brown University, Laboratoire d'Informatique de l'Université du Mans (LIUM), and Le Mans Université (UM)
- Subjects
Acoustics and Ultrasonics ,business.industry ,Computer science ,Speech recognition ,Speech coding ,Word error rate ,020206 networking & telecommunications ,Speech synthesis ,02 engineering and technology ,[INFO] Computer Science [cs] ,Broadcasting ,Speech processing ,computer.software_genre ,Cable television ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Telephony ,Electrical and Electronic Engineering ,Transcription (software) ,0305 other medical science ,business ,computer ,ComputingMilieux_MISCELLANEOUS - Abstract
This paper describes the progress made in the transcription of broadcast news (BN) and conversational telephone speech (CTS) within the combined BBN/LIMSI system from May 2002 to September 2004. During that period, BBN and LIMSI collaborated in an effort to produce significant reductions in the word error rate (WER), as directed by the aggressive goals of the Effective, Affordable, Reusable, Speech-to-text [Defense Advanced Research Projects Agency (DARPA) EARS] program. The paper focuses on general modeling techniques that led to recognition accuracy improvements, as well as engineering approaches that enabled efficient use of large amounts of training data and fast decoding architectures. Special attention is given on efforts to integrate components of the BBN and LIMSI systems, discussing the tradeoff between speed and accuracy for various system combination strategies. Results on the EARS progress test sets show that the combined BBN/LIMSI system achieved relative reductions of 47% and 51% on the BN and CTS domains, respectively
- Published
- 2006
- Full Text
- View/download PDF