Back to Search
Start Over
Comparison of Sampling Techniques for Imbalanced Data Classification
- Source :
- วารสารวิทยาการสารสนเทศและเทคโนโลยีประยุกต์, Vol 1, Iss 1, Pp 20-37 (2018)
- Publication Year :
- 2018
- Publisher :
- Faculty of Informatics, 2018.
-
Abstract
- Imbalanced data is a problem in the machine learning process for data classification, which results in low classification efficiency. It has also been found that random sampling techniques are used in several ways for solving low performance problems due to data imbalances. This research aims to compare sampling techniques for imbalanced data classification. The research was conducted on three data sets, which are Synthetic minority over-sampling technique, under-sampling technique and resample techniques for Imbalanced data preprocessing. Decision Tree, cart, random forest, support vector machine and artificial neural network algorithms are ensembled with adaboost and bagging algorithms to create models for data classification. Ten-fold cross validation was used to measure model performance. Performance was measured with precision, recall and f-measure. The results showed that resample techniques could improve the imbalanced data better than synthetic minority over-sampling technique. In addition, it was found that the random forest model, the adaboost ensemble with random forest model and the bagging ensemble with random forest model were efficient for data classification in this research.
Details
- Language :
- English, Thai
- ISSN :
- 2630094X and 25868136
- Volume :
- 1
- Issue :
- 1
- Database :
- Directory of Open Access Journals
- Journal :
- วารสารวิทยาการสารสนเทศและเทคโนโลยีประยุกต์
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.3938173d0884bbea263b685b7b7fde2
- Document Type :
- article
- Full Text :
- https://doi.org/10.14456/jait.2018.2