Sciweavers

SAC
2006
ACM

A scalable algorithm for high-quality clustering of web snippets

14 years 6 months ago
A scalable algorithm for high-quality clustering of web snippets
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment the well-known furthest-point-first algorithm for k-center clustering in metric spaces with a filtering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical k-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Clustering Keywords Meta Search Engines, Web Snippets, Clus...
Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where SAC
Authors Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fabrizio Sebastiani
Comments (0)