Back to Search Start Over

Incrementally updating unary inclusion dependencies in dynamic data

Authors :
Christoph Meinel
Nuhad Shaabani
Source :
Distributed and Parallel Databases. 37:133-176
Publication Year :
2018
Publisher :
Springer Science and Business Media LLC, 2018.

Abstract

Inclusion dependencies form one of the most fundamental classes of integrity constraints. Their importance in classical data management is reinforced by modern applications like data profiling, data cleaning, entity resolution, and schema matching. Their discovery in an unknown dataset is at the core of any data-analysis effort. Therefore, several research approaches have focused on their efficient discovery in a given, static dataset. However, none of these approaches are appropriate for application on dynamic datasets. In these cases, discovery techniques should be able to efficiently update the inclusion dependencies after an update in the dataset, without reprocessing the entire dataset. We present the first approach for incrementally updating the unary inclusion dependencies. In particular, our approach is based on the concept of attribute clustering, from which the unary inclusion dependencies are efficiently derivable. We incrementally update the clusters after each update of the dataset. An update of the clusters does not need access to the dataset because of special data structures designed to efficiently support the updating process. We performed an exhaustive analysis of our approach by applying it to large datasets with several hundred attributes and more than 116.2 million tuples. The results showed that the incremental discovery significantly reduces the runtime needed by the static discovery. This reduction in the runtime is up to 99.9996% for both the insertion and the deletion.

Details

ISSN :
15737578 and 09268782
Volume :
37
Database :
OpenAIRE
Journal :
Distributed and Parallel Databases
Accession number :
edsair.doi...........e04486aa73ffb8871346b0d349b45766
Full Text :
https://doi.org/10.1007/s10619-018-7233-5