Sciweavers

ICWE
2005
Springer

Identifying Websites with Flow Simulation

14 years 6 months ago
Identifying Websites with Flow Simulation
We present in this paper a method to discover the set of webpages contained in a logical website, based on the link structure of the Web graph. Such a method is useful in the context of Web archiving and website importance computation. To identify the boundaries of a website, we combine the use of an online version of the preflow-push algorithm, an algorithm for the maximum flow problem in traffic networks, and of the Markov CLuster (MCL) algorithm. The latter is used on a crawled portion of the Web graph in order to build a seed of initial webpages, a seed which is extended using the former. An experiment on a subsite of the INRIA Website is described.
Pierre Senellart
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ICWE
Authors Pierre Senellart
Comments (0)