—There are numerous studies that examine whether or not cloned code is harmful to software systems. Yet, few of them study which characteristics of cloned code in particular lead to software defects. In our work, we use survival analysis to understand the impact of clones on software defects and to determine the characteristics of cloned code that have the highest impact on software defects. Our survival models express the risk of defects in terms of basic predictors inherent to the code (e.g., LOC) and cloning predictors (e.g., number of clone siblings). We perform a case study using two clone detection tools on two large, long-lived systems using survival analysis. We determine that the defect-proneness of cloned methods is specific to the system under study and that more resources should be directed towards methods with a longer 'commit history'.
Gehan M. K. Selim, Liliane Barbour, Weiyi Shang, B