Back to Search Start Over

OPTIMAL FEATURE SUBSET SELECTION BASED ON COMBINING DOCUMENT FREQUENCY AND TERM FREQUENCY FOR TEXT CLASSIFICATION.

Authors :
KARPAGALINGAM, Thirumoorthy
KARUPPAIAH, Muneeswaran
Source :
Computing & Informatics; 2020, Vol. 39 Issue 5, p881-906, 26p
Publication Year :
2020

Abstract

Feature selection plays a vital role to reduce the high dimension of the feature space in the text document classification problem. The dimension reduction of feature space reduces the computation cost and improves the text classification system accuracy. Hence, the identification of a proper subset of the significant features of the text corpus is needed to classify the data in less computational time with higher accuracy. In this proposed research, a novel feature selection method which combines the document frequency and the term frequency (FS-DFTF) is used to measure the significance of a term. The optimal feature subset which is selected by our proposed work is evaluated using Naive Bayes and Support Vector Machine classifier with various popular benchmark text corpus datasets. The experimental outcome confirms that the proposed method has a better classification accuracy when compared with other feature selection techniques. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13359150
Volume :
39
Issue :
5
Database :
Supplemental Index
Journal :
Computing & Informatics
Publication Type :
Academic Journal
Accession number :
149639580
Full Text :
https://doi.org/10.31577/cai_2020_5_881