In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.

On the Effectiveness of Dynamic Modeling in UML: Results from an External Replication

SCANNIELLO, GIUSEPPE
2009

Abstract

In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.
9781424451241
File in questo prodotto:
File Dimensione Formato  
Printed.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: DRM non definito
Dimensione 981.28 kB
Formato Adobe PDF
981.28 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11563/13936
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 3
social impact