We propose an approach based on Winner Takes All, a competitive clustering algorithm, to support the comprehension of static and dynamic web applications. The process first computes the distances between the web pages and then identifies similar pages through the Winner Takes All clustering algorithm. Two different instances of the process are presented to identify similar pages at structural and content level, respectively. The first instance encodes the page structure into a string and then uses the Levenshtein algorithm to achieve the distances between pairs of pages. On the other hand, to group similar pages at content level we use the Latent Semantic Indexing to produce the page representations as vectors in the concept space. The Euclidean distance is then computed between the vectors to achieve the distances between the pages to be given as input to the adopted clustering algorithm. A prototype to automate the identification of group of similar pages has been implemented. The approach and the prototype have been assessed in a case study.

Using a Competitive Clustering Algorithm to Comprehend Web Applications

SCANNIELLO, GIUSEPPE;
2006-01-01

Abstract

We propose an approach based on Winner Takes All, a competitive clustering algorithm, to support the comprehension of static and dynamic web applications. The process first computes the distances between the web pages and then identifies similar pages through the Winner Takes All clustering algorithm. Two different instances of the process are presented to identify similar pages at structural and content level, respectively. The first instance encodes the page structure into a string and then uses the Levenshtein algorithm to achieve the distances between pairs of pages. On the other hand, to group similar pages at content level we use the Latent Semantic Indexing to produce the page representations as vectors in the concept space. The Euclidean distance is then computed between the vectors to achieve the distances between the pages to be given as input to the adopted clustering algorithm. A prototype to automate the identification of group of similar pages has been implemented. The approach and the prototype have been assessed in a case study.
2006
9780769526966
File in questo prodotto:
File Dimensione Formato  
printedPaper.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: DRM non definito
Dimensione 137.67 kB
Formato Adobe PDF
137.67 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/13689
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 4
social impact