Back to Search
Start Over
A Hybrid Data Cleaning Framework Using Markov Logic Networks.
- Source :
-
IEEE Transactions on Knowledge & Data Engineering . May2022, Vol. 34 Issue 5, p2048-2062. 15p. - Publication Year :
- 2022
-
Abstract
- With the increase of dirty data, data cleaning turns into a crux of data analysis. The accuracy limitation of the existing integrity constraints-based cleaning approaches results from insufficient rules. In this paper, we present a novel hybrid data cleaning framework on top of Markov logic networks (MLNs), termed as ${\sf MLNClean}$ MLNClean , which is capable of learning instantiated rules to supplement the insufficient integrity constraints. ${\sf MLNClean}$ MLNClean consists of two steps, i.e., pre-processing and two-stage data cleaning. In the pre-processing step, ${\sf MLNClean}$ MLNClean first infers a set of probable instantiated rules according to MLNs and then builds a two-layer MLN index structure to generate multiple data versions and facilitate the cleaning process. In the two-stage data cleaning step, ${\sf MLNClean}$ MLNClean first presents a concept of reliability score to clean errors within each data version separately, and afterward eliminates the conflict values among different data version using a novel concept of fusion score. Considerable experimental results on both real and synthetic scenarios demonstrate the effectiveness of ${\sf MLNClean}$ MLNClean in practice. [ABSTRACT FROM AUTHOR]
- Subjects :
- *LOGIC
*MARKOV processes
Subjects
Details
- Language :
- English
- ISSN :
- 10414347
- Volume :
- 34
- Issue :
- 5
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Knowledge & Data Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 156273272
- Full Text :
- https://doi.org/10.1109/TKDE.2020.3012472