Accelerating Tabular Inference: Training Data Generation with TENET

Veltri, E.; Santoro, D.; Bussotti, J. -F.; Papotti, P.

doi:10.14778/3750601.3750657

Tabular Natural Language Inference (TNLI) involves machine learning models that assess whether structured tabular data supports or contradicts a hypothesis formulated in natural language. TNLI models typically require large sets of training examples, which are costly to produce manually. In this demonstration, we present Tenet, a system for the automatic generation of training examples for TNLI applications. Existing TNLI training approaches either depend on Donatello Santoro [email protected] University of Basilicata Potenza, Italy Paolo Papotti [email protected] EURECOM Biot, France Table: Person Name t1 Mike t2  t3 Anne John Age City 47 22 SF NY 19 SF TENET Training data Example A Claim: "Mike and Anne come from the same city" Label: Refutes Evidence cells: {t1.Name: "Mike", t2.Name: "Anne", t1.city: "SF", t2.City: "NY"} Example B Claim: "Mike is older than Anne" Label: Supports Evidence cells: {t1.Name: "Mike", t2.Name: "Anne", t1.Age: 47, t2.Age: 22} costly human annotation or generate simplistic examples that lack data diversity and complex reasoning. In contrast, Tenet can start from a small set of manually annotated examples to automatically TNLI Application Test data Figure 1: Given a table and selected cells, Tenet creates traingenerate a large and diverse training dataset. Tenet is based on the idea that SQL queries are the right tool for obtaining rich and complex generated examples. To ensure data variety, evidence-queries extract cell values from tables based on diverse data patterns. Once the relevant data are identi ed, semantic queries de ne di erent ways to interpret it using SQL clauses. These interpretations are then verbalized as text to create annotated examples for TNLI. This demonstration o ers an interactive experience where users will be able to select evidence from tabular data, inspect and re ne generated queries, and observe how Tenet transforms structured data into natural language hypotheses. By engaging with di erent scenarios, users will see how Tenet enables the rapid creation of high-quality TNLI datasets, leading to inference models with performance comparable to those trained on manually crafted examples.

Accelerating Tabular Inference: Training Data Generation with TENET

Veltri E.;Santoro D.;Bussotti J. -F.;Papotti P.

2025-01-01

Abstract

Tabular Natural Language Inference (TNLI) involves machine learning models that assess whether structured tabular data supports or contradicts a hypothesis formulated in natural language. TNLI models typically require large sets of training examples, which are costly to produce manually. In this demonstration, we present Tenet, a system for the automatic generation of training examples for TNLI applications. Existing TNLI training approaches either depend on Donatello Santoro [email protected] University of Basilicata Potenza, Italy Paolo Papotti [email protected] EURECOM Biot, France Table: Person Name t1 Mike t2 t3 Anne John Age City 47 22 SF NY 19 SF TENET Training data Example A Claim: "Mike and Anne come from the same city" Label: Refutes Evidence cells: {t1.Name: "Mike", t2.Name: "Anne", t1.city: "SF", t2.City: "NY"} Example B Claim: "Mike is older than Anne" Label: Supports Evidence cells: {t1.Name: "Mike", t2.Name: "Anne", t1.Age: 47, t2.Age: 22} costly human annotation or generate simplistic examples that lack data diversity and complex reasoning. In contrast, Tenet can start from a small set of manually annotated examples to automatically TNLI Application Test data Figure 1: Given a table and selected cells, Tenet creates traingenerate a large and diverse training dataset. Tenet is based on the idea that SQL queries are the right tool for obtaining rich and complex generated examples. To ensure data variety, evidence-queries extract cell values from tables based on diverse data patterns. Once the relevant data are identi ed, semantic queries de ne di erent ways to interpret it using SQL clauses. These interpretations are then verbalized as text to create annotated examples for TNLI. This demonstration o ers an interactive experience where users will be able to select evidence from tabular data, inspect and re ne generated queries, and observe how Tenet transforms structured data into natural language hypotheses. By engaging with di erent scenarios, users will see how Tenet enables the rapid creation of high-quality TNLI datasets, leading to inference models with performance comparable to those trained on manually crafted examples.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno del prodotto

2025

Appare nelle tipologie:

4.1 Contributo in atti di Convegno

File in questo prodotto:

File	Dimensione	Formato
p5303-veltri.pdf accesso aperto Tipologia: Pdf editoriale Licenza: Creative commons Dimensione 805.82 kB Formato Adobe PDF Visualizza/Apri	805.82 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/204356

Citazioni

ND

0

0

Accelerating Tabular Inference: Training Data Generation with TENET

Veltri E.;Santoro D.;Bussotti J. -F.;Papotti P.

2025-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)