Back to Search Start Over

ARABIC SCRIPT WEB PAGE LANGUAGE IDENTIFICATION USING HYBRID-KNN METHOD.

Authors :
SELAMAT, ALI
SUBROTO, IMAM MUCH IBNU
CHOON-CHING NG
Source :
International Journal of Computational Intelligence & Applications. Sep2009, Vol. 8 Issue 3, p315-343. 29p. 1 Black and White Photograph, 8 Diagrams, 12 Charts, 4 Graphs.
Publication Year :
2009

Abstract

In this paper, we proposed hybrid-KNN methods on the Arabic script web page language identification. One of the crucial tasks in the text-based language identification that utilizes the same script is how to produce reliable features and how to deal with the huge number of languages in the world. Specifically, it has involved the issue of feature representation, feature selection, identification performance, retrieval performance, and noise tolerance performance. Therefore, there are a number of methods that have been evaluated in this work; k-nearest neighbor (KNN), support vector machine (SVM), backpropagation neural networks (BPNN), hybrid KNN-SVM, and KNN-BPNN, in order to justify the capability of the state-of-the-art methods. KNN is prominent in data clustering or data filtering, SVM and BPNN are well known in supervised classification, and we have proposed hybrid-KNN for noise removal on web page language identification. We have used the standard measurements which are accuracy, precision, recall and F1 measurements to evaluate the effectiveness of the proposed hybrid-KNN. From the experiment, we have observed that BPNN is able to produce precise identification if the data set given is clean. However, when increasing the level of noise in the training data, KNN-SVM performs better than KNN-BPNN against the misclassification data, even on the level of 50% noise. Therefore, it is proven that KNN-SVM produce promising identification performance, in which KNN is able to reduce the noise in the data set and SVM is reliable in the language identification. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14690268
Volume :
8
Issue :
3
Database :
Academic Search Index
Journal :
International Journal of Computational Intelligence & Applications
Publication Type :
Academic Journal
Accession number :
43919849
Full Text :
https://doi.org/10.1142/S146902680900262X