Tabular data is becoming increasingly important in Tabular Natural Language Inference (TNLI), where the goal is to assess if a table supports or refutes a given hypothesis expressed in NL text. A major issue in TNLI is the lack of such training data. Existing approaches are based on manual annotation of new training data or simple augmentation techniques that lack data variety and complexity. We present a system, Tenet, that automatically generates new training examples for TNLI applications on different domains. Our framework exploits SQL queries to introduce new data variety through evidence-queries that identify new cell values over data exploiting different data patterns, and complexity using semantic-queries that describe the different ways such data can be identified through SQL queries. Description from the semantic-queries are used to verbalize the new cell values from the evidence-queries using a Pretrained Language Model (PLM). The verbalized sentence and the cell values can be used as a new training example in the target TNLI application. We show how Tenet generates human-like examples that are comparable with manually-written examples.
A Framework for the Generation of Training Examples from Tabular Data
Santoro D.;Veltri E.
2024-01-01
Abstract
Tabular data is becoming increasingly important in Tabular Natural Language Inference (TNLI), where the goal is to assess if a table supports or refutes a given hypothesis expressed in NL text. A major issue in TNLI is the lack of such training data. Existing approaches are based on manual annotation of new training data or simple augmentation techniques that lack data variety and complexity. We present a system, Tenet, that automatically generates new training examples for TNLI applications on different domains. Our framework exploits SQL queries to introduce new data variety through evidence-queries that identify new cell values over data exploiting different data patterns, and complexity using semantic-queries that describe the different ways such data can be identified through SQL queries. Description from the semantic-queries are used to verbalize the new cell values from the evidence-queries using a Pretrained Language Model (PLM). The verbalized sentence and the cell values can be used as a new training example in the target TNLI application. We show how Tenet generates human-like examples that are comparable with manually-written examples.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.