Back to Search
Start Over
Relevant information undersampling to support imbalanced data classification
- Source :
- Neurocomputing. 436:136-146
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- Traditional classification algorithms suppose that the sample distribution among classes is balanced. Yet, such an assumption leads to biased performance over the majority class. This paper proposes a Relevant Information-based UnderSampling (RIUS) approach to select the most relevant examples from the majority class to improve the classification performance for imbalanced data scenarios. RIUS builds on the information-preservation principle that extracts the majority class’s underlying structure with fewer samples. Additionally, we couple our RIUS approach to the well-known Clustering-based Undersampling algorithm (CBUS) to enhance the data representation, and named this RIUS enhancement as CRIUS. Experimental results show that RIUS and CRIUS reveal the data’s relevant structure and reduce the loss of information by selecting the most informative instances.
- Subjects :
- Structure (mathematical logic)
0209 industrial biotechnology
Computer science
Cognitive Neuroscience
02 engineering and technology
computer.software_genre
Imbalanced data
Computer Science Applications
Statistical classification
020901 industrial engineering & automation
Sampling distribution
Artificial Intelligence
Undersampling
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
Cluster analysis
computer
Relevant information
Subjects
Details
- ISSN :
- 09252312
- Volume :
- 436
- Database :
- OpenAIRE
- Journal :
- Neurocomputing
- Accession number :
- edsair.doi...........08503ebc6480d91eb1a28186f9188dc6
- Full Text :
- https://doi.org/10.1016/j.neucom.2021.01.033