1. Spam Filtering Based on Improved CHI Feature Selection Method
- Author
-
Hongxia Yu, Dongmei Fan, Zhimao Lu, and Chaoyue Yuan
- Subjects
business.industry ,Computer Science::Information Retrieval ,Feature extraction ,Pattern recognition ,Feature selection ,Filter (signal processing) ,computer.software_genre ,Support vector machine ,Cross entropy ,F-test ,Entropy (information theory) ,Data mining ,Artificial intelligence ,business ,computer ,Mathematics ,Statistical hypothesis testing - Abstract
In this paper, methods of feature selection used in the spam filtering are studied, including CHI square (CHI), Expected Cross Entropy (ECE), the Weight of Evidence for Text (WET) and Information Gain (IG) and a novel modified CHI feature selection method is proposed in spam filtering. The spam filter combined Support Vector Machine (SVM) is selected to evaluate the CHI square, Expected Cross Entropy, the Weight of Evidence for Text, Information Gain and modified CHI. The experiment proved that the modified CHI could improve the precision, recall and F test measure of spam filter and the modified CHI feature selection method is effective.
- Published
- 2009
- Full Text
- View/download PDF