Efficient Clustering with Limited Distance Information

15 years 6 months ago

Download xialab.bu.edu

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.

Konstantin Voevodski, Maria-Florina Balcan, Heiko

Real-time Traffic

CORR 2010 | Education | Fast Sequence Database | Natural Assumption | Sequence Similarity |

claim paper

» Optimal distance geographic routing for energy efficient wireless sensor networks

» Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Informatio...

» A Distributed SDP Approach for LargeScale Noisy AnchorFree Graph Realization with Applicat...

» Efficiently Learning the Metric with SideInformation

» Distributed EnergyEfficient Hierarchical Clustering for Wireless Sensor Networks

» Barrier coverage with sensors of limited mobility

» DistanceSensitive Information Brokerage in Sensor Networks

» LIMBO Scalable Clustering of Categorical Data

Post Info
More Details (n/a)

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2010
Where	CORR
Authors	Konstantin Voevodski, Maria-Florina Balcan, Heiko Röglin, Shang-Hua Teng, Yu Xia

Comments (0)

Sciweavers

Efficient Clustering with Limited Distance Information

CORR 2010 | Education | Fast Sequence Database | Natural Assumption | Sequence Similarity |

Explore & Download

Productivity Tools

Sciweavers