Back to Search
Start Over
Contemporaneous text as side-information in statistical language modeling
- Source :
- Computer Speech & Language. 18:143-162
- Publication Year :
- 2004
- Publisher :
- Elsevier BV, 2004.
-
Abstract
- We propose new methods to exploit contemporaneous text, such as on-line news articles, to improve language models for automatic speech recognition and other natural language processing applications. In particular, we investigate the use of text from a resource-rich language to sharpen language models for processing a news story or article in a language with scarce linguistic resources. We demonstrate that even with fairly crude cross-language information retrieval and simple machine translation, one can construct story-specific Chinese language models which exploit cues from a side-corpus of English newswire to significantly improve the performance of language models estimated from a static Chinese corpus. Our investigations cover cases when the amount of available Chinese text is small, and a case when a large Chinese text corpus is available. We examine the effectiveness of our techniques both when the side-corpus contains English documents that are near-translations of the Chinese documents being processed, and when the English side-corpus is merely from contemporaneous and independent news sources. We present experimental results for automatic transcription of speech from the Mandarin Broadcast News corpus.
- Subjects :
- Text corpus
Language identification
Computer science
business.industry
computer.software_genre
Theoretical Computer Science
Human-Computer Interaction
Universal Networking Language
Transcription (linguistics)
Corpus linguistics
Cache language model
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Language model
Artificial intelligence
Computational linguistics
business
computer
Software
Natural language processing
Subjects
Details
- ISSN :
- 08852308
- Volume :
- 18
- Database :
- OpenAIRE
- Journal :
- Computer Speech & Language
- Accession number :
- edsair.doi...........651901b88b0956be9ea1c4415995786a
- Full Text :
- https://doi.org/10.1016/j.csl.2003.09.001