Back to Search
Start Over
Exploiting Category Information and Document Information to Improve Term Weighting for Text Categorization
- Source :
- Computational Linguistics and Intelligent Text Processing ISBN: 9783540709381, CICLing
- Publication Year :
- 2007
- Publisher :
- Springer Berlin Heidelberg, 2007.
-
Abstract
- Traditional tfidf-like term weighting schemes have a rough statistic -- idfas the term weighting factor, which does not exploit the category information (category labels on documents) and intra-document information (the relative importance of a given term to a given document that contains it) from the training data for a text categorization task. We present here a more elaborate nonparametric probabilistic model to make use of this sort of information in the term weighting phase. idfis theoretically proved to be a rough approximation of this new term weighting factor. This work is preliminary and mainly aiming at providing inspiration for further study on exploitation of this information, but it already provides a moderate performance boost on three popular document collections.
Details
- ISBN :
- 978-3-540-70938-1
- ISBNs :
- 9783540709381
- Database :
- OpenAIRE
- Journal :
- Computational Linguistics and Intelligent Text Processing ISBN: 9783540709381, CICLing
- Accession number :
- edsair.doi...........ebed2853433acf3e5e598e5cea7bf699
- Full Text :
- https://doi.org/10.1007/978-3-540-70939-8_52