Back to Search
Start Over
Comparative Analysis of Document level Text Classification Algorithms using R
- Source :
- IOP Conference Series: Materials Science and Engineering; August 2017, Vol. 225 Issue: 1 p012076-012076, 1p
- Publication Year :
- 2017
-
Abstract
- From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
Details
- Language :
- English
- ISSN :
- 17578981 and 1757899X
- Volume :
- 225
- Issue :
- 1
- Database :
- Supplemental Index
- Journal :
- IOP Conference Series: Materials Science and Engineering
- Publication Type :
- Periodical
- Accession number :
- ejs43139436