In world wide web, a document is usually made up of multiple pages, each one of which has a unique URL address and links to each other by hyperlink pointers. Related documents are...
Recovering semantic relations between different parts of web pages are of great importance for multi-platform web interface development, as they make it possible to re-distribute ...
Users of the World-Wide Web are not only confronted by an immense overabundance of information, but also by a plethora of tools for searching for the web pages that suit their inf...
Web masters usually place certain web pages such as home pages and index pages in front of others. Under such a design, it is necessary to go through some pages to reach the desti...
Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for finding near-duplicate web pag...