1. Enhance Term Weighting Algorithm as Feature Selection Technique for Illicit Web Content Classification
- Author
-
Ali Selamat, Mohd Aizaini Maarof, Zhi-Sam Lee, and Siti Mariyam Shamsuddin
- Subjects
Information retrieval ,Artificial neural network ,Computer science ,business.industry ,Feature selection ,Information security ,Machine learning ,computer.software_genre ,Weighting ,Web page ,Entropy (information theory) ,The Internet ,Web content ,Artificial intelligence ,business ,computer - Abstract
The exponential increase of information in Internet has raise the issue of information security. Pornography Web content is one of the biggest harmful resource that pollute the mind of children and teenagers. Several Web content based analysis approaches had been proposed to avoiding these illicit Web content accessing by the children. However implementation of each solution still remain as an issue. Most of the approaches are weak against classify the high similarity Web content such as pornography and gynecology Web pages. In this study, we try to solve this issue by propose a modified term weighting scheme which used as term feature selection technique for illicit Web page classification. We examine the performance of this proposed technique via three data sets which represent three critical scenarios and compare it with original term weighting scheme. Based on our observation, the proposed technique had shown its superiority for illicit Web pages classification which averagely achieve higher than 90\% accuracy rate. Meanwhile the experiment result also denote that the proposed technique had improve from original term weighting scheme. We hope that this study would give other researchers an insight especially who work in the similar area.
- Published
- 2008