Back to Search Start Over

On the Reusability of Data Cleaning Workflows

Authors :
Li, Lan
Ludäscher, Bertram
Publication Year :
2022
Publisher :
Zenodo, 2022.

Abstract

The goal of data cleaning is to make data fit for purpose, i.e., to improve data quality, through updates and data transformations, such that downstream analyses can be conducted and lead to trustworthy results. A transparent and reusable data cleaning workflow can save time and effort through automation, and make subsequent data cleaning on new data less error-prone. However, the reusability of data cleaning workflows has received little to no attention in the research community. We identify some challenges and opportunities for reusing data cleaning workflows. We present a high-level conceptual model to clarify what we mean by reusability and propose ways to improve reusability along different dimensions. We use the opportunity of presenting at IDCC to invite the community to share their use cases, experiences, and desiderata for the reuse of data cleaning workflows and recipes in order to foster new collaborations and guide future work.<br />This lightning talk is given on 2022-06-14, 17th International Digital Curation Conference (IDCC22)

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....9a9749fd1a799bf83252aa2a6058800f
Full Text :
https://doi.org/10.5281/zenodo.6645542