101. Class-Based Language Model Adaptation.
- Author
-
Gabbay, Dov M., Siekmann, Jörg, Bundy, A., Carbonell, J. G., Pinkal, M., Uszkoreit, H., Veloso, M., Wahlster, W., Wooldridge, M. J., Wahlster, Wolfgang, Emele, Martin C., Valsan, Zica, Lam, Yin Hay, and Goronzy, Silke
- Abstract
In this paper we introduce and evaluate two class-based language model adaptation techniques for adapting general n-gram-based background language models to a specific spoken dialogue task. The required background language models are derived from available newspaper corpora and Internet newsgroup collections. We followed a standard mixture-based approach for language model adaptation by generating several clusters of topic-specific language models and combined them into a specific target language model using different weights depending on the chosen application domain. In addition, we developed a novel word n-gram pruning technique for domain adaptation and proposed a new approach for thematic text clustering. This method relies on a new discriminative n-gram-based key term selection process for document clustering. These key terms are then used to automatically cluster the whole document collection. By selecting only relevant text clusters for language model training, we addressed the problem of generating task-specific language models. Different key term selection methods are investigated using perplexity as the evaluation measure. Automatically computed clusters are compared with manually labeled genre clusters, and the results provide a significant performance improvement depending on the chosen key term selection method. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF