tection via Structural Abstraction William S. Evans Department of Computer Science University of British Columbia Vancouver, B.C. V6T 1Z4, CANADA Christopher W. Fraser Microsoft Research Redmond, WA 98052, USA August 2005 Technical Report MSR-TR-2005-104 This paper describes the design, implementation, and application of a new algorithm to loned code. It operates on the abstract syntax trees formed by many compilers as an intermediate representation. It extends prior work by identifying clones even when arbitrary subtrees have been changed. On a 16,000-line code corpus, 20-50% of its clones eluded previous methods. The method also identifies cloning in declarations, so it is more general than conventional procedural abstraction.
William S. Evans, Christopher W. Fraser, Fei Ma