Using n-grams to rapidly characterise the evolution of software code

14 years 6 months ago

Download homepages.feis.herts.ac.uk

Text-based approaches to the analysis of software evolution are attractive because of the fine-grained, token-level comparisons they can generate. The use of such approaches has, however, been constrained by the lack of an efficient implementation. In this paper we demonstrate the ability of Ferret, which uses ngrams of 3 tokens, to characterise the evolution of software code. Ferret’s implementation operates in almost linear time and is at least an order of magnitude faster than the diff tool. Ferret’s output can be analysed to reveal several characteristics of software evolution, such as: the lifecycle of a single file, the degree of change between two files, and possible regression. In addition, the similarity scores produced by Ferret can be aggregated to measure larger parts of the system being analysed.

Austen Rainer, Peter C. R. Lane, James A. Malcolm,

Real-time Traffic

Ferret’s Implementation | KBSE 2008 | Software Engineering | Software Evolution | Text-based Approaches |

claim paper

Post Info
More Details (n/a)

Added	31 May 2010
Updated	31 May 2010
Type	Conference
Year	2008
Where	KBSE
Authors	Austen Rainer, Peter C. R. Lane, James A. Malcolm, Sven-Bodo Scholz

Comments (0)

Sciweavers

Using n-grams to rapidly characterise the evolution of software code

Ferret’s Implementation | KBSE 2008 | Software Engineering | Software Evolution | Text-based Approaches |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers