Back to Search Start Over

A semantic and service-based approach for adaptive mutli-structured data curation in data lakehouses.

Authors :
Zouari, Firas
Ghedira-Guegan, Chirine
Boukadi, Khouloud
Kabachi, Nadia
Source :
World Wide Web. Nov2023, Vol. 26 Issue 6, p4001-4023. 23p.
Publication Year :
2023

Abstract

Recently, we noticed the emergence of several data management architectures to cope with the challenges imposed by big data. Among them, data lakehouses are receiving much interest from industrial and academic fields due to their ability to hold disparate multi-structured batch and streaming data sources in a single data repository. Thus, the heterogeneous and complex aspect of the data requires a dedicated process to improve their quality and retrieve value from them. Therefore, data curation encompasses several tasks that clean and enrich data to ensure it continues to fit the user requirements. Nevertheless, most existing data curation approaches need more dynamics, flexibility, and customization in constituting the data curation pipeline to align with end user requirements that may vary according to her/his decision context. Moreover, they are dedicated to curating only a single type of structure of batch data sources (e.g., semi-structured). Considering the changing requirements of the user and the need to build a customized data curation pipeline according to the users and the data source characteristics, we propose a service-based framework for adaptive data curation in data lakehouses that encompasses five modules: data collection, data quality evaluation, data characterization, curation service composition, and data curation. The proposed framework is built upon new data characterization and evaluation modular ontology and a curation service composition approach that we detail in the following paper. The experimental findings validate the contributions' performance in terms of effectiveness and execution time. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1386145X
Volume :
26
Issue :
6
Database :
Academic Search Index
Journal :
World Wide Web
Publication Type :
Academic Journal
Accession number :
174525979
Full Text :
https://doi.org/10.1007/s11280-023-01218-3