A number of clustering based approaches and tools have been proposed in the past to partition a software system into subsystems. The greater part of these approaches is semiautomatic, thus requiring human decision to identify the best partition of software entities into clusters among the possible partitions. In addition, some approaches are conceived for software systems implemented using a particular programming language (e.g., C and C++). In this paper we present an approach to automate the partitioning of a given software system into subsystems. In particular, the approach first analyzes the software entities (e.g., programs or classes) and then using Latent Semantic Indexing the dissimilarity between these entities is computed. Finally, software entities are grouped using iteratively the k-means clustering algorithm. The approach has been implemented in a prototype of a supporting software system as an Eclipse plug-in. Finally, to assess the approach and the plug-in, we have conducted an empirical investigation on three open source software systems implemented using the programming languages Java and C/C++.

Architecture Recovery using Latent Semantic Indexing and K-Means: an Empirical Evaluation

SCANNIELLO, GIUSEPPE;
2010-01-01

Abstract

A number of clustering based approaches and tools have been proposed in the past to partition a software system into subsystems. The greater part of these approaches is semiautomatic, thus requiring human decision to identify the best partition of software entities into clusters among the possible partitions. In addition, some approaches are conceived for software systems implemented using a particular programming language (e.g., C and C++). In this paper we present an approach to automate the partitioning of a given software system into subsystems. In particular, the approach first analyzes the software entities (e.g., programs or classes) and then using Latent Semantic Indexing the dissimilarity between these entities is computed. Finally, software entities are grouped using iteratively the k-means clustering algorithm. The approach has been implemented in a prototype of a supporting software system as an Eclipse plug-in. Finally, to assess the approach and the plug-in, we have conducted an empirical investigation on three open source software systems implemented using the programming languages Java and C/C++.
2010
9781424482894
File in questo prodotto:
File Dimensione Formato  
printed paper.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: DRM non definito
Dimensione 594.98 kB
Formato Adobe PDF
594.98 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/13988
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 24
  • ???jsp.display-item.citation.isi??? ND
social impact