In this work, we address the challenging and open problem of involving non-expert users in the data-repairing problem as irst-class citizens. Despite a large number of proposals that have been devoted to cleaning data from the point of view of expert users (IT staf and data scientists), there is a lack of studies from the perspective of non-expert ones. Given a set of available data quality rules, we exploit machine learning techniques to guide the user to identify the dirty values for each violation and repair them. We show that with a low user efort, it is possible to identify the values in tuples that can be trusted and the ones that are most likely errors. We show experimentally how this machine-learning approach leads to a unique clean solution with high quality in scenarios where other approaches fail.

BUNNI: Learning Repair Actions in Rule-driven Data Cleaning

Mecca, Giansalvatore;Santoro, Donatello;Veltri, Enzo
2024-01-01

Abstract

In this work, we address the challenging and open problem of involving non-expert users in the data-repairing problem as irst-class citizens. Despite a large number of proposals that have been devoted to cleaning data from the point of view of expert users (IT staf and data scientists), there is a lack of studies from the perspective of non-expert ones. Given a set of available data quality rules, we exploit machine learning techniques to guide the user to identify the dirty values for each violation and repair them. We show that with a low user efort, it is possible to identify the values in tuples that can be trusted and the ones that are most likely errors. We show experimentally how this machine-learning approach leads to a unique clean solution with high quality in scenarios where other approaches fail.
2024
File in questo prodotto:
File Dimensione Formato  
3665930.pdf

solo utenti autorizzati

Descrizione: Versione "Just Published"
Tipologia: Documento in Pre-print
Licenza: Non definito
Dimensione 872.33 kB
Formato Adobe PDF
872.33 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/180515
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact