Back to Search Start Over

A Hybrid Data Cleaning Framework Using Markov Logic Networks.

Authors :
Ge, Congcong
Gao, Yunjun
Miao, Xiaoye
Yao, Bin
Wang, Haobo
Source :
IEEE Transactions on Knowledge & Data Engineering. May2022, Vol. 34 Issue 5, p2048-2062. 15p.
Publication Year :
2022

Abstract

With the increase of dirty data, data cleaning turns into a crux of data analysis. The accuracy limitation of the existing integrity constraints-based cleaning approaches results from insufficient rules. In this paper, we present a novel hybrid data cleaning framework on top of Markov logic networks (MLNs), termed as ${\sf MLNClean}$ MLNClean , which is capable of learning instantiated rules to supplement the insufficient integrity constraints. ${\sf MLNClean}$ MLNClean consists of two steps, i.e., pre-processing and two-stage data cleaning. In the pre-processing step, ${\sf MLNClean}$ MLNClean first infers a set of probable instantiated rules according to MLNs and then builds a two-layer MLN index structure to generate multiple data versions and facilitate the cleaning process. In the two-stage data cleaning step, ${\sf MLNClean}$ MLNClean first presents a concept of reliability score to clean errors within each data version separately, and afterward eliminates the conflict values among different data version using a novel concept of fusion score. Considerable experimental results on both real and synthetic scenarios demonstrate the effectiveness of ${\sf MLNClean}$ MLNClean in practice. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
*LOGIC
*MARKOV processes

Details

Language :
English
ISSN :
10414347
Volume :
34
Issue :
5
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
156273272
Full Text :
https://doi.org/10.1109/TKDE.2020.3012472