Large multi-platform, multi-million lines of codes software systems evolve to cope with new platform or to meet user ever changing needs. While there has been several studies focused on the similarity of code fragments or modules, few studies addressed the need to monitor the overall system evolution. Meanwhile, the decision to evolve or to refactor a large software system needs to be supported by high level information, representing the system overall picstracting from unnecessary details. This paper proposes to extend the concept of similarity of code fragments to quantify similarities at the release/system level. Similarities are captured by four software metrics representative of the commonalities and differences within and among software artifacts. To show the feasibility of characterizing large software system with the new metrics, 365 releases of the Linux kernel were analyzed. The metrics, the experimental results as well as the lessons learned are presented in the paper.
Ettore Merlo, Michel Dagenais, P. Bachand, J. S. S