Back to Search Start Over

Global reliable data generation for imbalanced binary classification with latent codes reconstruction and feature repulsion.

Authors :
Jia, Xin
Gao, Xin
Chen, Wenli
Cheng, Yingying
Meng, Zhihang
Xue, Bing
Huang, Zijian
Fu, Shiyuan
Source :
Applied Intelligence; Jul2023, Vol. 53 Issue 13, p16922-16960, 39p
Publication Year :
2023

Abstract

Common binary classification algorithms which learn directly from imbalanced data can lead to a bias towards the majority class. Although the over-sampling technology can solve the imbalance problem, the realness of the synthesized samples cannot be guaranteed. Generative Adversarial Networks can improve the authenticity of the generated samples. However, it may cause mode collapse, resulting in the data distribution space of the minority class changed after balance. A sample-level data generation method is proposed in this paper for imbalanced classification. Firstly, we present a reconstruction technique of latent codes with mutual information constraints for global data generation. The latent codes of the input sample are divided into latent vectors of key features and subordinate features respectively. We can obtain the mutated latent codes by retaining the key features' latent vector and randomly replacing the subordinate features' latent vector. Then the reliable similar mutation samples are generated through decoder restoration, mutual information constraint, and discriminant confrontation. In addition, the feature repulsion technique and the combination coding technique are proposed to solve the problem of feature extraction and classification for samples in overlapping areas. The former carries out supervised feature representation learning of the key features' latent vector. The latter superimposes the reconstruction error of each dimension of the sample as a supplement for the latent vector of key features. Combined with a variety of typical base classifiers, a large number of experimental results on public datasets show that the proposed method outperforms other typical data balancing methods in F1-Measure and G-Mean. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0924669X
Volume :
53
Issue :
13
Database :
Complementary Index
Journal :
Applied Intelligence
Publication Type :
Academic Journal
Accession number :
164661403
Full Text :
https://doi.org/10.1007/s10489-022-04330-5