The paper investigates techniques for extracting data from HTML sites through the use of auto- matically generated wrappers. To automate the wrapper generation and the data extraction pro- cess, the paper develops a novel technique to com- pare HTML pages and generate a wrapper based on their similarities and differences. Experimental results on real-life data-intensive Web sites con- firm the feasibility of the approach.

RoadRunner: Towards Automatic Data Extraction from Large Web Sites

MECCA, Giansalvatore;
2001

Abstract

The paper investigates techniques for extracting data from HTML sites through the use of auto- matically generated wrappers. To automate the wrapper generation and the data extraction pro- cess, the paper develops a novel technique to com- pare HTML pages and generate a wrapper based on their similarities and differences. Experimental results on real-life data-intensive Web sites con- firm the feasibility of the approach.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11563/9626
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 755
  • ???jsp.display-item.citation.isi??? ND
social impact