Back to Search Start Over

Imbalanced data classification using improved synthetic minority over-sampling technique.

Authors :
Anusha, Yamijala
Visalakshi, R.
Srinivas, Konda
Source :
Multiagent & Grid Systems; 2023, Vol. 19 Issue 2, p117-131, 15p
Publication Year :
2023

Abstract

In data mining, deep learning and machine learning models face class imbalance problems, which result in a lower detection rate for minority class samples. An improved Synthetic Minority Over-sampling Technique (SMOTE) is introduced for effective imbalanced data classification. After collecting the raw data from PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases, the pre-processing is performed using min-max normalization, cleaning, integration, and data transformation techniques to achieve data with better uniqueness, consistency, completeness and validity. An improved SMOTE algorithm is applied to the pre-processed data for proper data distribution, and then the properly distributed data is fed to the machine learning classifiers: Support Vector Machine (SVM), Random Forest, and Decision Tree for data classification. Experimental examination confirmed that the improved SMOTE algorithm with random forest attained significant classification results with Area under Curve (AUC) of 94.30%, 91%, 96.40%, and 99.40% on the PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15741702
Volume :
19
Issue :
2
Database :
Complementary Index
Journal :
Multiagent & Grid Systems
Publication Type :
Academic Journal
Accession number :
172805880
Full Text :
https://doi.org/10.3233/MGS-230007