We demonstrate a system to automatically grab data from data intensive web sites. The system first infers a model that describes at the intensional level the web site as a collection of classes; each class represents a set of structurally homogeneous pages, and it is associated with a small set of representative pages. Based on the model a library of wrap- pers, one per class, is then inferred, with the help an external wrapper generator. The model, together with the library of wrappers, can thus be used to navigate the site and ex- tract the data.

An Automatic Data Grabber for Large Web Sites

MECCA, Giansalvatore;
2004-01-01

Abstract

We demonstrate a system to automatically grab data from data intensive web sites. The system first infers a model that describes at the intensional level the web site as a collection of classes; each class represents a set of structurally homogeneous pages, and it is associated with a small set of representative pages. Based on the model a library of wrap- pers, one per class, is then inferred, with the help an external wrapper generator. The model, together with the library of wrappers, can thus be used to navigate the site and ex- tract the data.
2004
0120884690
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/9578
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact