We propose an approach to automatically detect duplicated pages in dynamic Web sites. Our approach analyzes both the page structure, implemented by specific sequences of HTML tags, and the displayed content. In addition, for each pair of dynamic pages we also consider the similarity degree of their scripting source code. The similarity degree of two pages is computed using different similarity metrics for the different parts of a web page based on the Levenshtein string edit distance. We have implemented a prototype to automate the clone detection process on web applications developed using JSP technology and used it to validate our approach in a case study.

Identifying Clones in Dynamic Web Sites Using Similarity thresholds

SCANNIELLO, GIUSEPPE;
2004-01-01

Abstract

We propose an approach to automatically detect duplicated pages in dynamic Web sites. Our approach analyzes both the page structure, implemented by specific sequences of HTML tags, and the displayed content. In addition, for each pair of dynamic pages we also consider the similarity degree of their scripting source code. The similarity degree of two pages is computed using different similarity metrics for the different parts of a web page based on the Levenshtein string edit distance. We have implemented a prototype to automate the clone detection process on web applications developed using JSP technology and used it to validate our approach in a case study.
2004
9789728865009
File in questo prodotto:
File Dimensione Formato  
ICEIS_2004.pdf

solo utenti autorizzati

Tipologia: Documento in Pre-print
Licenza: DRM non definito
Dimensione 273.69 kB
Formato Adobe PDF
273.69 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/13801
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? ND
social impact