Back to Search Start Over

Comparison of Sampling Techniques for Imbalanced Data Classification

Authors :
Karn Nasritha
Kittisak Kerdprasop
Nittaya Kerdprasop
Source :
วารสารวิทยาการสารสนเทศและเทคโนโลยีประยุกต์, Vol 1, Iss 1, Pp 20-37 (2018)
Publication Year :
2018
Publisher :
Faculty of Informatics, 2018.

Abstract

Imbalanced data is a problem in the machine learning process for data classification, which results in low classification efficiency. It has also been found that random sampling techniques are used in several ways for solving low performance problems due to data imbalances. This research aims to compare sampling techniques for imbalanced data classification. The research was conducted on three data sets, which are Synthetic minority over-sampling technique, under-sampling technique and resample techniques for Imbalanced data preprocessing. Decision Tree, cart, random forest, support vector machine and artificial neural network algorithms are ensembled with adaboost and bagging algorithms to create models for data classification. Ten-fold cross validation was used to measure model performance. Performance was measured with precision, recall and f-measure. The results showed that resample techniques could improve the imbalanced data better than synthetic minority over-sampling technique. In addition, it was found that the random forest model, the adaboost ensemble with random forest model and the bagging ensemble with random forest model were efficient for data classification in this research.

Details

Language :
English, Thai
ISSN :
2630094X and 25868136
Volume :
1
Issue :
1
Database :
Directory of Open Access Journals
Journal :
วารสารวิทยาการสารสนเทศและเทคโนโลยีประยุกต์
Publication Type :
Academic Journal
Accession number :
edsdoj.3938173d0884bbea263b685b7b7fde2
Document Type :
article
Full Text :
https://doi.org/10.14456/jait.2018.2