Back to Search Start Over

Exploiting Category Information and Document Information to Improve Term Weighting for Text Categorization

Authors :
Jingyang Li
Maosong Sun
Source :
Computational Linguistics and Intelligent Text Processing ISBN: 9783540709381, CICLing
Publication Year :
2007
Publisher :
Springer Berlin Heidelberg, 2007.

Abstract

Traditional tfidf-like term weighting schemes have a rough statistic -- idfas the term weighting factor, which does not exploit the category information (category labels on documents) and intra-document information (the relative importance of a given term to a given document that contains it) from the training data for a text categorization task. We present here a more elaborate nonparametric probabilistic model to make use of this sort of information in the term weighting phase. idfis theoretically proved to be a rough approximation of this new term weighting factor. This work is preliminary and mainly aiming at providing inspiration for further study on exploitation of this information, but it already provides a moderate performance boost on three popular document collections.

Details

ISBN :
978-3-540-70938-1
ISBNs :
9783540709381
Database :
OpenAIRE
Journal :
Computational Linguistics and Intelligent Text Processing ISBN: 9783540709381, CICLing
Accession number :
edsair.doi...........ebed2853433acf3e5e598e5cea7bf699
Full Text :
https://doi.org/10.1007/978-3-540-70939-8_52