Back to Search Start Over

Relevant information undersampling to support imbalanced data classification

Authors :
Andrés Marino Álvarez-Meza
Genaro Daza-Santacoloma
J. Hoyos-Osorio
Germán Castellanos-Domínguez
Álvaro-Ángel Orozco-Gutierrez
Source :
Neurocomputing. 436:136-146
Publication Year :
2021
Publisher :
Elsevier BV, 2021.

Abstract

Traditional classification algorithms suppose that the sample distribution among classes is balanced. Yet, such an assumption leads to biased performance over the majority class. This paper proposes a Relevant Information-based UnderSampling (RIUS) approach to select the most relevant examples from the majority class to improve the classification performance for imbalanced data scenarios. RIUS builds on the information-preservation principle that extracts the majority class’s underlying structure with fewer samples. Additionally, we couple our RIUS approach to the well-known Clustering-based Undersampling algorithm (CBUS) to enhance the data representation, and named this RIUS enhancement as CRIUS. Experimental results show that RIUS and CRIUS reveal the data’s relevant structure and reduce the loss of information by selecting the most informative instances.

Details

ISSN :
09252312
Volume :
436
Database :
OpenAIRE
Journal :
Neurocomputing
Accession number :
edsair.doi...........08503ebc6480d91eb1a28186f9188dc6
Full Text :
https://doi.org/10.1016/j.neucom.2021.01.033