Efficient Distribution Mining and Classification

14 years 2 months ago

Download www.db.cs.cmu.edu

We define and solve the problem of "distribution classification", and, in general, "distribution mining". Given n distributions (i.e., clouds) of multi-dimensional points, we want to classify them into k classes, to find patterns, rules and out-lier clouds. For example, consider the 2-d case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution/cloud of 2-d points (one for each item he bought). We want to group similar users together, e.g., for market segmentation, anomaly/fraud detection. We propose D-Mine to achieve this goal. Our main contribution is Theorem 3.1, which shows how to use wavelets to speed up the cloud-similarity computations. Extensive experiments on both synthetic and real multidimensional data sets show that our method achieves up to 400 faster wall-clock time over the naive implementation, with comparable (and occasionally better) classification quality.

Yasushi Sakurai, Rosalynn Chong, Lei Li, Christos

Real-time Traffic

Data Mining | Distribution Classification | Multi-dimensional Points | Out-lier Clouds | SDM 2008 |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2008
Where	SDM
Authors	Yasushi Sakurai, Rosalynn Chong, Lei Li, Christos Faloutsos

Comments (0)

Sciweavers

Efficient Distribution Mining and Classification

Data Mining | Distribution Classification | Multi-dimensional Points | Out-lier Clouds | SDM 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers