1. An improved SMOTE based on center offset factor and synthesis strategy for imbalanced data classification.
- Author
-
Zhang, Ying, Deng, Li, Huang, Hefeng, and Wei, Bo
- Subjects
- *
MACHINE learning , *PROBLEM solving , *INTERPOLATION , *STATISTICAL sampling , *CLASSIFICATION - Abstract
It is an enormous challenge for imbalanced data learning in the field of machine learning. To construct balanced datasets, oversampling techniques have been studied extensively. However, many oversampling methods suffer from introducing noisy samples and blurring classification boundaries, leading to overfitting. To solve this problem, this paper proposes a new oversampling method, namely CS-SMOTE, for synthesizing minority class samples by three-point interpolation. CS-SMOTE is mainly based on the center offset factor and a synthesis strategy. First, the CS-SMOTE method removes noise samples, calculates the center offset factor, and selects sparsely distributed minority class samples by using the K-distance graph technique. Next, new samples are generated based on sparse minority samples, random minority samples, and sub-cluster centers located in the same sub-cluster samples. Finally, multiple comparative experiments on 18 well-known datasets demonstrate the effectiveness and general applicability of the proposed CS-SMOTE method for the imbalanced data classification. The experiments show that CS-SMOTE outperforms other competitors in terms of classification accuracy, while avoiding the issue of overfitting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF