Back to Search Start Over

Comparative Analysis of Document level Text Classification Algorithms using R

Authors :
Syamala, Maganti
J, Dr N
Maguluri, Lakshamanaphaneendra
Ragupathy, R
Source :
IOP Conference Series: Materials Science and Engineering; August 2017, Vol. 225 Issue: 1 p012076-012076, 1p
Publication Year :
2017

Abstract

From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.

Details

Language :
English
ISSN :
17578981 and 1757899X
Volume :
225
Issue :
1
Database :
Supplemental Index
Journal :
IOP Conference Series: Materials Science and Engineering
Publication Type :
Periodical
Accession number :
ejs43139436