Back to Search Start Over

Possibilistic Data Cleaning.

Authors :
Koehler, Henning
Link, Sebastian
Source :
IEEE Transactions on Knowledge & Data Engineering. Dec2022, Vol. 34 Issue 12, p5939-5950. 12p.
Publication Year :
2022

Abstract

Classical data cleaning performs a minimal set of operations on the data to satisfy the given integrity constraints. Often, this minimization is equivalent to vertex cover, for example when tuples can be removed due to the violation of functional dependencies. Classically, the uncertainty of tuples and constraints is ignored. We propose not to view data as dirty but the uncertainty information about data. Since probabilities are often unavailable and their treatment is limited due to correlations in the data, we investigate a qualitative approach to uncertainty. Tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Our approach is non-invasive to the data as we lower the possibility degree of tuples as little as possible. The new resulting qualitative version of vertex cover remains NP-hard. We establish an algorithm that is fixed-parameter tractable in the size of the qualitative vertex cover. Experiments with synthetic and real-world data show that our algorithm outperforms the classical algorithm proportionally to the available number of uncertainty degrees. By mining the certainty degrees with which constraints hold, our framework becomes applicable even when uncertainty information is unavailable. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
34
Issue :
12
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
160692086
Full Text :
https://doi.org/10.1109/TKDE.2021.3062318