Back to Search Start Over

Adaptive fuzzy multi-neighborhood feature selection with hybrid sampling and its application for class-imbalanced data.

Authors :
Sun, Lin
Li, Mengmeng
Ding, Weiping
Xu, Jiucheng
Source :
Applied Soft Computing; Dec2023:Part A, Vol. 149, pN.PAG-N.PAG, 1p
Publication Year :
2023

Abstract

For imbalanced data, classification efficiency degrades significantly due to the missing information for the positive class, and existing sampling schemes do not consider the distributions of samples. Additionally, the global parameters of fuzzy neighborhoods are set manually. These defects affect the effectiveness of classifier. To address these problems, we offer an adaptive fuzzy multi-neighborhood feature selection methodology with intercluster distance-based hybrid sampling for class-imbalanced data. First, the number of clusters can be defined in terms of the number of samples in the negative or positive class. The initial centers of the clusters are determined according to the number of clusters, and the dissimilarity and similarity measures are calculated by using the intercluster distances between samples. Then, the cluster center, fuzzy membership matrix, and intercluster distance are studied, and then the optimization objective function is designed. The hybrid sampling scheme can be used to combine the generated positive class samples and negative class samples and obtain a class-balanced system. Second, according to the sample distribution, the standard deviation and a set of adaptive fuzzy multi-neighborhood radii are designed. A fuzzy multi-neighborhood similarity relation is defined by introducing a Gaussian kernel model to obtain a fuzzy multi-neighborhood granule, and an improved fuzzy multi-neighborhood rough set model is provided. Uncertain measures of fuzzy neighborhood systems are evaluated by the positive region and dependency. Third, by integrating fuzzy dependence with fuzzy complementary condition entropy, fuzzy multi-neighborhood complementary mutual information is provided on two viewpoints of algebra and information. Finally, a heuristic feature subset selection methodology for imbalanced classification with hybrid sampling using fuzzy c-means clustering is studied to obtain this excellent set of features. Experiments on 26 imbalanced datasets show the effectiveness of our designed algorithm. ● The optimization objective function of hybrid sampling via intercluster distance is designed to get a class-balanced system. ● A set of adaptive fuzzy multi-neighborhood radii is designed to study the fuzzy multi-neighborhood similarity relation. ● An adaptive fuzzy multi-neighborhood rough set model is provided to assess uncertain measures of fuzzy neighborhood systems. ● Fuzzy multi-neighborhood complementary mutual information is constructed from the two viewpoints of algebra and information. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15684946
Volume :
149
Database :
Supplemental Index
Journal :
Applied Soft Computing
Publication Type :
Academic Journal
Accession number :
173726267
Full Text :
https://doi.org/10.1016/j.asoc.2023.110968