Back to Search Start Over

An overlapping oriented imbalanced ensemble learning algorithm with weighted projection clustering grouping and consistent fuzzy sample transformation.

Authors :
Li, Fan
Wang, Bo
Shen, Yinghua
Wang, Pin
Li, Yongming
Source :
Information Sciences. Aug2023, Vol. 637, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

Class imbalance and class overlapping problems can exist simultaneously in the imbalanced learning. However, most of the existing algorithms mainly focus on the former. Although some recent algorithms focus on the class overlapping problem, they do not effectively identify the overlapping region, resulting in a loss of sample information, and they are always applied to the original samples with low quality. To address these problems, this paper proposes an imbalanced ensemble learning algorithm based on weighted projection clustering grouping and consistent fuzzy sample transformation (PCGDST-IE). Firstly, a weighted projection clustering combination framework (WPCC) guided by Davies-Bouldin clustering effectiveness index (DBI) is designed to obtain high-quality clusters and the clusters are combined to form cross-complete subsets (CCS) with low overlapping. Secondly, a stage-wise hybrid sampling algorithm is designed to realize the de-overlapping and balancing of subsets. Finally, a local–global structure consistency mechanism (LGSCM) is constructed by fuzzy clustering and domain adaption, thereby reducing class overlapping and improving the quality of samples in subsets. Weak classifiers are trained on the balanced subsets, and fused. More than 30 public datasets and over ten representative algorithms are chosen to verify the proposed method. The experimental results show that the PCGDST-IE is significantly better in terms of anti-overlapping, Recall, F1-M, G-M, AUC, and diversity. The major originality of the paper is: (a) proposing the WPCC to realize weighted projection clustering for subsets generation; (b) proposing the SHS to balance class imbalance and overlapping better;(c) proposing the LGSCM for sample transformation to address the quality of subsets; and (d) forming an imbalanced algorithm to better solve the class imbalance and class overlapping problems simultaneously. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
637
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
163469748
Full Text :
https://doi.org/10.1016/j.ins.2023.118955