1. DBMUTE: density-based majority under-sampling technique
- Author
-
Chumphol Bunkhumpornpat and Krung Sinapiromsaran
- Subjects
0209 industrial biotechnology ,business.industry ,Cluster graph ,Pattern recognition ,Probability density function ,02 engineering and technology ,Minority class ,Under sampling ,Human-Computer Interaction ,Density based ,Class imbalance ,ComputingMethodologies_PATTERNRECOGNITION ,020901 industrial engineering & automation ,Artificial Intelligence ,Hardware and Architecture ,Shortest path problem ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Classifier (UML) ,Software ,Information Systems ,Mathematics - Abstract
Class imbalance is a challenging problem that demonstrates the unsatisfactory classification performance of a minority class. A trivial classifier is biased toward minority instances because of their tiny fraction. In this paper, our density function is defined as the distance along the shortest path between each majority instance and a minority-cluster pseudo-centroid in an underlying cluster graph. A short path implies highly overlapping dense minority instances. In contrast, a long path indicates a sparsity of instances. A new under-sampling algorithm is proposed to eliminate majority instances with low distances because these instances are insignificant and obscure the classification boundary in the overlapping region. The results show predictive improvements on a minority class from various classifiers on different UCI datasets.
- Published
- 2016
- Full Text
- View/download PDF