A scalable algorithm for high-quality clustering of web snippets

16 years 23 days ago

Download nmis.isti.cnr.it

We consider the problem of partitioning, in a highly accurate and highly eﬃcient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment the well-known furthest-point-ﬁrst algorithm for k-center clustering in metric spaces with a ﬁltering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical k-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Clustering Keywords Meta Search Engines, Web Snippets, Clus...

Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa

Real-time Traffic

Applied Computing | Metric Spaces | SAC 2006 | Web Snippets | Well-known Furthest-point-ﬁrst Algorithm |

claim paper

» A Method of Web Search Result Clustering Based on Rough Sets

» Cluster Generation and Labeling for Web Snippets A Fast Accurate Hierarchical Solution

» Web Document Clustering A Feasibility Demonstration

» Scalable Hierarchical Clustering Method for Sequences of Categorical Values

» STC and NMSTC Two Novel Online Results Clustering Methods for Web Searching

» A personalized search engine based on websnippet hierarchical clustering

» Learning to cluster web search results

» OCluster Scalable Clustering of Large High Dimensional Data Sets

Post Info
More Details (n/a)

Added	14 Jun 2010
Updated	14 Jun 2010
Type	Conference
Year	2006
Where	SAC
Authors	Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fabrizio Sebastiani

Comments (0)

Sciweavers

A scalable algorithm for high-quality clustering of web snippets

Applied Computing | Metric Spaces | SAC 2006 | Web Snippets | Well-known Furthest-point-ﬁrst Algorithm |

Explore & Download

Productivity Tools

Sciweavers