An Approach to Identify Duplicated Web Pages

15 years 11 months ago

Download www.cse.dmu.ac.uk

A relevant consequence of the unceasing expansion of the Web and e-commerce is the growth of the demand of new Web sites and Web applications. The software industry is facing the new opportunity under the pressure of a very short time-to-market and an extremely high competition. As a result, Web sites and applications are usually developed without a formalized process, but Web pages are directly coded in an incremental way, where new pages are obtained by duplicating existing ones. Duplicated Web pages, having the same structure and just differing for the data they include, can be considered as clones. The identification of clones may reduce the effort devoted to test, maintain and evolve Web sites and applications. Moreover, clone detection among different Web sites aims to detect cases of possible plagiarism. In this paper we propose an approach, based on similarity metrics, to detect duplicated pages in Web sites and applications, implemented with HTML language and ASP technology. ...

Giuseppe A. Di Lucca, Massimiliano Di Penta, Anna

Real-time Traffic

COMPSAC 2002 | Software Engineering | Web Applications | Web Pages | Web Sites |

claim paper

» Revealing Hidden Community Structures and Identifying Bridges in Complex Networks An Appli...

» Cleaning Web Pages for Effective Web Content Mining

» An NGram Based Approach to Automatically Identifying Web Page Genre

» Understanding Content Reuse on the Web Static and Dynamic Analyses

» Identifying and Filtering NearDuplicate Documents

» SpotSigs robust and efficient near duplicate detection in large web collections

» Applying Semantic Links for Classifying Web Pages

» An Instancebased Approach for Identifying Candidate Ontology Relations within a MultiAgent...

Post Info
More Details (n/a)

Added	14 Jul 2010
Updated	14 Jul 2010
Type	Conference
Year	2002
Where	COMPSAC
Authors	Giuseppe A. Di Lucca, Massimiliano Di Penta, Anna Rita Fasolino

Comments (0)

Sciweavers

An Approach to Identify Duplicated Web Pages

COMPSAC 2002 | Software Engineering | Web Applications | Web Pages | Web Sites |

Explore & Download

Productivity Tools

Sciweavers