In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.
On the Effectiveness of Dynamic Modeling in UML: Results from an External Replication
SCANNIELLO, GIUSEPPE
2009-01-01
Abstract
In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
Printed.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
DRM non definito
Dimensione
981.28 kB
Formato
Adobe PDF
|
981.28 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.