Back to Search Start Over

Mining incomplete data using global and saturated probabilistic approximations based on characteristic sets and maximal consistent blocks.

Authors :
Clark, Patrick G.
Grzymala-Busse, Jerzy W.
Hippe, Zdzislaw S.
Mroczek, Teresa
Source :
Information Sciences. Mar2024, Vol. 662, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

In this paper, we discuss a rough set approach to missing attribute values. Among many ways of interpreting missing values, in this paper we focus on two interpretations, lost values and "do not care" conditions. Using these interpretations, global and saturated probabilistic approximations are constructed with two types of granules: characteristic sets and maximal consistent blocks. We compare eight approaches, combining two interpretations of missing attribute values, two types of probabilistic approximations with two types of granules using an error rate that is computed as a result of ten-fold cross-validation. Using a 5% level of statistical significance, we present the experimental results for these eight approaches, showing statistically significant differences between all approaches to mining incomplete data. The results also show that no one method and approach is the best for every data set and that all eight approaches should be attempted. The final section of the paper presents the idea of concept-compatible data sets. We show that for these types of data sets, global and saturated probabilistic approximations for a concept are identical to the concept. We also show that for an incomplete data set with no duplicate rows using the lost interpretation of missing attribute values, the data set is concept-compatible. • Two interpretations of missing attribute values: lost values and "do not care" conditions are considered • Global and saturated probabilistic approximations are constructed from characteristic sets and maximal consistent blocks • Eight approaches to mining incomplete data sets are compared and significant differences between them are indicated • A novel idea of the concept-compatible data sets is introduced [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
662
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
175456753
Full Text :
https://doi.org/10.1016/j.ins.2024.120287