Back to Search Start Over

Detective Gadget: Generic Iterative Entity Resolution over Dirty Data

Authors :
Marcello Buoncristiano
Giansalvatore Mecca
Donatello Santoro
Enzo Veltri
Source :
Data, Vol 9, Iss 12, p 139 (2024)
Publication Year :
2024
Publisher :
MDPI AG, 2024.

Abstract

In the era of Big Data, entity resolution (ER), i.e., the process of identifying which records refer to the same entity in the real world, plays a critical role in data-integration tasks, especially in mission-critical applications where accuracy is mandatory, since we want to avoid integrating different entities or missing matches. However, existing approaches struggle with the challenges posed by rapidly changing data and the presence of dirtiness, which requires an iterative refinement during the time. We present Detective Gadget, a novel system for iterative ER that seamlessly integrates data-cleaning into the ER workflow. Detective Gadgetemploys an alias-based hashing mechanism for fast and scalable matching, check functions to detect and correct mismatches, and a human-in-the-loop framework to refine results through expert feedback. The system iteratively improves data quality and matching accuracy by leveraging evidence from both automated and manual decisions. Extensive experiments across diverse real-world scenarios demonstrate its effectiveness, achieving high accuracy and efficiency while adapting to evolving datasets.

Details

Language :
English
ISSN :
23065729
Volume :
9
Issue :
12
Database :
Directory of Open Access Journals
Journal :
Data
Publication Type :
Academic Journal
Accession number :
edsdoj.507c58282e14169b23b973df9c73e0d
Document Type :
article
Full Text :
https://doi.org/10.3390/data9120139