DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

16 years 6 months ago

Download www.cs.ucdavis.edu

Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space Rn and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.

Ghassan Misherghi, Lingxiao Jiang, Stéphane

Real-time Traffic

Efficient Algorithm | ICSE 2007 | Large Code Bases | Software Engineering | Tree Similarity Algorithm |

claim paper

Post Info
More Details (n/a)

Added	09 Dec 2009
Updated	09 Dec 2009
Type	Conference
Year	2007
Where	ICSE
Authors	Ghassan Misherghi, Lingxiao Jiang, Stéphane Glondu, Zhendong Su

Comments (0)

Sciweavers

DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

Efficient Algorithm | ICSE 2007 | Large Code Bases | Software Engineering | Tree Similarity Algorithm |

Explore & Download

Productivity Tools

Sciweavers