Cloning is extremely likely to occur in web sites, much more so than in other software. While some clones exist for valid reasons, or are too small to eliminate, cloning percentages of 30% or higher—not uncommon in web sites— suggest that some improvements can be made. Finding and resolving the clones in web documents is rather challenging, however: syntax errors and routine use of multiple languages complicate parsing the documents and finding clones, while lack of native code reuse tools forces the analyst to rely on other technologies for resolution. Here we present a way to find clones in multilingual web documents, and resolve them using one of several code reuse techniques that are available in a dynamic web site. Rather than picking a single resolution technique and relying on it exclusively, we pick it based on the clone in question, to minimize disruption to the structure of original documents. 1 Removing Clones From Web Pages Previous research indicates that all softwa...
Nikita Synytskyy, James R. Cordy, Thomas R. Dean