Back to Search Start Over

New uncertainty measurement for categorical data based on fuzzy information structures: An application in attribute reduction.

Authors :
Zhang, Qinli
Chen, Yiying
Zhang, Gangqiang
Li, Zhaowen
Chen, Lijun
Wen, Ching-Feng
Source :
Information Sciences. Nov2021, Vol. 580, p541-577. 37p.
Publication Year :
2021

Abstract

Categorical data is a significant kind of data in machine learning.Generally, rough set theory (RS -theory) deals with categorical data in the following way.First, an equivalence relation based on the equality of attribute values of categorical data is established.Then, information granules (I -granules) based on equivalence classes are obtained.Finally, information structures (I -structures) consisting of I -granules are formed.However, an equivalence relation is too strict, and there are some limitations in the I -structure of a categorical information system (CIS) that may result in filtering out potentially useful information.This paper investigates fuzzy information structures (FI -structures) and new uncertainty measurements for categorical data from the perspective that "the equality of attribute values is fed back to the attribute set".First, a fuzzy symmetry relation based on the number of attributes with equal attribute values is established. Then, fuzzy information granules (FI -granules) based on the fuzzy symmetry relation are obtained. Next, FI -structures consisting of FI -granules are formed.Finally, some concepts related to FI -structures in a CIS are given.The set vector is used to denote FI -structures, and the inclusion degree is used to study the dependence between FI -structures.In addition, four new uncertainty measurements based on FI -structures in a CIS are proposed, including fuzzy information granulation ( G f ), fuzzy information entropy ( H f ), fuzzy rough entropy ( E r f ) and fuzzy information amount ( E f ).Moreover, numerical experiments and statistical tests to evaluate the performance of the proposed new measurements are carried out.The results of the paired t -test show that the performance of the four new measurements based on FI -structures is better than that of the corresponding four measurements based on I -structures.Finally, attribute reduction algorithms based on G f and H f are presented, and clustering analysis is conducted on the reduced CIS. The experimental results show that the proposed algorithms are effective and perform well on attribute reduction according to three evaluation indicators of clustering performance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
580
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
153291238
Full Text :
https://doi.org/10.1016/j.ins.2021.08.089