Modern source-control systems, such as Subversion, preserve change-sets of files as atomic commits. However, the specific ordering information in which files were changed is typically not found in these source-code repositories. In this paper, a set of heuristics for grouping change-sets (i.e., log-entries) found in source-code repositories is presented. Given such groups of change-sets, sequences of files that frequently change together are uncovered. This approach not only gives the (unordered) sets of files but supplements them with (partial temporal) ordering information. The technique is demonstrated on a subset of KDE source-code repository. The results show that the approach is able to find sequences of changed-files. Categories and Subject Descriptors D.2.7. [Software Engineering]: Distribution, Maintenance, and Enhancement – documentation, enhancement, extensibility, version control General Terms Management, Experimentation Keywords Mining Software Repositories, Heuristics,...
Huzefa H. Kagdi, Shehnaaz Yusuf, Jonathan I. Malet