The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. The other dimension, structural, refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. An advanced information retrieval method, latent semantic indexing, is used to define a semantic similarity measure between software components. Components within a software system are then clustered together using this similarity measure. Simple structural information (i.e., file organization) of the software system is then used to assess the semantic cohesion of the clusters and files, with respect to each other. The measures are formally defined for general application. A set of experiments is presented which de...
Jonathan I. Maletic, Andrian Marcus