In this work, we address the challenging and open problem of involving non-expert users in the data-repairing problem as irst-class citizens. Despite a large number of proposals that have been devoted to cleaning data from the point of view of expert users (IT staf and data scientists), there is a lack of studies from the perspective of non-expert ones. Given a set of available data quality rules, we exploit machine learning techniques to guide the user to identify the dirty values for each violation and repair them. We show that with a low user efort, it is possible to identify the values in tuples that can be trusted and the ones that are most likely errors. We show experimentally how this machine-learning approach leads to a unique clean solution with high quality in scenarios where other approaches fail.
BUNNI: Learning Repair Actions in Rule-driven Data Cleaning
Mecca, Giansalvatore;Santoro, Donatello;Veltri, Enzo
2024-01-01
Abstract
In this work, we address the challenging and open problem of involving non-expert users in the data-repairing problem as irst-class citizens. Despite a large number of proposals that have been devoted to cleaning data from the point of view of expert users (IT staf and data scientists), there is a lack of studies from the perspective of non-expert ones. Given a set of available data quality rules, we exploit machine learning techniques to guide the user to identify the dirty values for each violation and repair them. We show that with a low user efort, it is possible to identify the values in tuples that can be trusted and the ones that are most likely errors. We show experimentally how this machine-learning approach leads to a unique clean solution with high quality in scenarios where other approaches fail.File | Dimensione | Formato | |
---|---|---|---|
3665930.pdf
accesso aperto
Descrizione: Versione "Just Published"
Tipologia:
Documento in Post-print
Licenza:
Non definito
Dimensione
872.33 kB
Formato
Adobe PDF
|
872.33 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.