1. A comparative study on text representation schemes in text categorization
- Author
-
Song, Fengxi, Liu, Shuhai, and Yang, Jingyu
- Abstract
It is well known that the classification effectiveness of the text categorization system is not simply a matter of learning algorithms. Text representation factors are also at work. This paper will consider the ways in which the effectiveness of text classifiers is linked to the five text representation factors: “stop words removal”, “word stemming”, “indexing”, “weighting”, and “normalization”. Statistical analyses of experimental results show that performing “normalization” can always promote effectiveness of text classifiers significantly. The effects of the other factors are not as great as expected. Contradictory to common sense, a simple binary indexing method can sometimes be helpful for text categorization.
- Published
- 2005
- Full Text
- View/download PDF