Sciweavers

SIGIR
2008
ACM

Pagerank based clustering of hypertext document collections

14 years 5 days ago
Pagerank based clustering of hypertext document collections
Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering. Categories and Subject Descriptors H.3 [Information Search and Retrieval]: Miscellaneous General Terms Algorithms, Experiments Keywords PageRank based Clustering, Directed Graphs
Konstantin Avrachenkov, Vladimir Dobrynin, Danil N
Added 28 Dec 2010
Updated 28 Dec 2010
Type Journal
Year 2008
Where SIGIR
Authors Konstantin Avrachenkov, Vladimir Dobrynin, Danil Nemirovsky, Son Kim Pham, Elena Smirnova
Comments (0)