1. 基于 Tomek 链的边界少数类样本合成过采样方法.
- Author
-
陶佳晴, 贺作伟, 冷强奎, 翟军昌, and 孟祥福
- Subjects
- *
K-nearest neighbor classification , *INTERPOLATION , *ALGORITHMS , *NEIGHBORS , *CLASSIFICATION - Abstract
In a class-imbalanced dataset, since the samples close to the class boundary are more likely to be misclassified, it is of great significance to accurately identify boundary samples for classification. Existing methods usually use K-nearest neighbors to identify boundary samples, but the accuracy needs to be improved. To address the above problem, this paper proposed a synthetic oversampling method for boundary minority samples based on Tomek links. This method first found inter-class samples with the nearest distance to form Tomek links. Then, it identifies those minority samples located at the inter-class boundary according to Tomek links. Next, it used the linear interpolation mechanism in synthetic minority oversampling technology (SMOTE) to perform oversampling between the boundary samples and their minority neighbors, thereby achieving the balance of the datasets. The comparison experiment with eight sampling algorithms shows that the proposed method can obtain higher G-mean and F, values on most of the datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF