In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.
Towards Automatic Clustering of Similar Pages in Web Applications
SCANNIELLO, GIUSEPPE;
2009-01-01
Abstract
In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
WSE_2009_CR.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
DRM non definito
Dimensione
510.06 kB
Formato
Adobe PDF
|
510.06 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.