Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data

14 years 2 months ago

Download www-users.cs.umn.edu

The problem of finding clusters in data is challenging when clusters are of widely differing sizes, densities and shapes, and when the data contains large amounts of noise and outliers. Many of these issues become even more significant when the data is of very high dimensionality, such as text or time series data. In this paper we present a novel clustering technique that addresses these issues. Our algorithm first finds the nearest neighbors of each data point and then redefines the similarity between pairs of points in terms of how many nearest neighbors the two points share. Using this new definition of similarity, we eliminate noise and outliers, identify core points, and then build clusters around the core points. The use of a shared nearest neighbor definition of similarity removes problems with varying density, while the use of core points handles problems with shape and size. We experimentally show that our algorithm performs better than traditional methods (e.g., K-means) on ...

Levent Ertöz, Michael Steinbach, Vipin Kumar

Real-time Traffic

Core Points | Data Mining | Nearest Neighbor | SDM 2003 | Time Series Data |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	2003
Where	SDM
Authors	Levent Ertöz, Michael Steinbach, Vipin Kumar

Comments (0)

Sciweavers

Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data

Core Points | Data Mining | Nearest Neighbor | SDM 2003 | Time Series Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers