4 results on '"ZHI SAM LEE"'
Search Results
2. Text Content Analysis For Illicit Web Pages By Using Neural Networks
- Author
-
Ali Selamat, Zhi Sam Lee, Mohd Aizaini Maarof, and Siti Mariyam Shamsuddin
- Subjects
Information retrieval ,Artificial neural network ,business.industry ,Computer science ,Content analysis ,Web page ,General Engineering ,Artificial intelligence ,business - Abstract
Kandungan laman web haram seperti pornografi, keganasan dan perjudian telah dengan meluasnya mencemarkan pemikiran pengguna internet terutamanya golongan muda seperti kanakkanak dan muda–mudi. Oleh kerana kurang berkesannya beberapa teknik penapisan saringan laman sesawang yang popular seperti penyekatan Uniform Resource Locator (URL) dan penyemakan Platform for Internet Content Selection (PICS) terhadap kandungan sesawang yang dinamik pada masa kini, maka teknik penapisan yang berasaskan analisis kandungan sesawang secara berkesan amat diperlukan. Demi mengatasi masalah ini, kami telah mencadangkan suatu model penganalisis kandungan web berasaskan teks dengan menggunakan skema entropy term weighting untuk mengelaskan laman pornografi dan laman pendidikan seks dalam penulisan ini. Kajian terhadap keberkesanan skema entropy dijalankan dengan membandingkan skema entropy dengan dua skema pemberat perkataan yang umum, iaitu TFIDF dan Glasgow. Teknik–teknik ini telah diuji dengan rangkaian neural menggunakan dataset berkelas kecil. Dalam kajian ini, kami mendapati model yang dicadangkan telah mencapai prestasi yang lebih baik dari segi kejituan, kecepatan penumpuan dan kestabilan. Kata kunci: Rangkaian neural buatan; skema pemberat perkataan; penganalisis kandungan berasaskan teks; pengelasan saringan laman sesawang Illicit web contents such as pornography, violence, and gambling have greatly polluted the mind of web users especially children and teenagers. Due to the ineffectiveness of some popular web filtering techniques like Uniform Resource Locator (URL) blocking and Platform for Internet Content Selection (PICS) checking against today’s dynamic web contents, content based analysis techniques with effective model are highly desired. In this paper, we have proposed a textual content analysis model using entropy term weighting scheme to classify pornography and sex education web pages. We have examined the entropy scheme with two other common term weighting schemes that are TFIDF and Glasgow. Those techniques have been tested with artificial neural network using small class dataset. In this study, we found that our proposed model has achieved better performance in terms accuracy, convergence speed, and stability compared to the other techniques. Key words: Artificial neural network; term weighting scheme; textual content analysis; web pages classification
- Published
- 2012
3. Enhance Term Weighting Algorithm as Feature Selection Technique for Illicit Web Content Classification
- Author
-
Ali Selamat, Mohd Aizaini Maarof, Zhi-Sam Lee, and Siti Mariyam Shamsuddin
- Subjects
Information retrieval ,Artificial neural network ,Computer science ,business.industry ,Feature selection ,Information security ,Machine learning ,computer.software_genre ,Weighting ,Web page ,Entropy (information theory) ,The Internet ,Web content ,Artificial intelligence ,business ,computer - Abstract
The exponential increase of information in Internet has raise the issue of information security. Pornography Web content is one of the biggest harmful resource that pollute the mind of children and teenagers. Several Web content based analysis approaches had been proposed to avoiding these illicit Web content accessing by the children. However implementation of each solution still remain as an issue. Most of the approaches are weak against classify the high similarity Web content such as pornography and gynecology Web pages. In this study, we try to solve this issue by propose a modified term weighting scheme which used as term feature selection technique for illicit Web page classification. We examine the performance of this proposed technique via three data sets which represent three critical scenarios and compare it with original term weighting scheme. Based on our observation, the proposed technique had shown its superiority for illicit Web pages classification which averagely achieve higher than 90\% accuracy rate. Meanwhile the experiment result also denote that the proposed technique had improve from original term weighting scheme. We hope that this study would give other researchers an insight especially who work in the similar area.
- Published
- 2008
4. Language Identifications of Arabic Script Web Documents Using Independent Component Analysis
- Author
-
Zhi-Sam Lee and Ali Selamat
- Subjects
Language identification ,Computer science ,business.industry ,Feature selection ,computer.software_genre ,language.human_language ,ComputingMethodologies_PATTERNRECOGNITION ,Web page ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,language ,Artificial intelligence ,Urdu ,Computational linguistics ,Document retrieval ,business ,computer ,Arabic script ,Natural language processing ,Persian - Abstract
We analyze the language identification algorithms used to identify the Arabic script Web documents such as Arabic, Jawi, Persian and Urdu using independent component analysis (ICA). We have used a combination of Entropy term weighting scheme and class based feature (CPBF) vectors as feature selection methods for selecting the best features of Arabic script Web documents for Web page language identifications. Then we input the selected features based on the identification of latent semantics of user profiles using singular value decomposition (SVD). The SVD has been used to remove the noises on the documents retrieved before applying the ICA for topic extraction. We assume that the topic on each document is independent from each other. We have used the information retrieval measures that are precision, recall and F\ in order to evaluate the effectiveness of the proposed algorithm. From the experiments, we have found that the proposed method could leads to good Arabic script language identification results with good separations of Arabic, Persian, and Urdu languages using the ICA.
- Published
- 2008
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.