Efficient Distribution Mining and Classification

15 years 8 months ago

Download www.db.cs.cmu.edu

We define and solve the problem of "distribution classification", and, in general, "distribution mining". Given n distributions (i.e., clouds) of multi-dimensional points, we want to classify them into k classes, to find patterns, rules and out-lier clouds. For example, consider the 2-d case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution/cloud of 2-d points (one for each item he bought). We want to group similar users together, e.g., for market segmentation, anomaly/fraud detection. We propose D-Mine to achieve this goal. Our main contribution is Theorem 3.1, which shows how to use wavelets to speed up the cloud-similarity computations. Extensive experiments on both synthetic and real multidimensional data sets show that our method achieves up to 400 faster wall-clock time over the naive implementation, with comparable (and occasionally better) classification quality.

Yasushi Sakurai, Rosalynn Chong, Lei Li, Christos

Real-time Traffic

Data Mining | Distribution Classification | Multi-dimensional Points | Out-lier Clouds | SDM 2008 |

claim paper

» Privacypreserving SVM classification

» The IOC algorithm efficient manyclass nonparametric classification for highdimensional dat...

» An Efficient Local Algorithm for Distributed Multivariate Regression in PeertoPeer Network...

» Adapting SVM Classifiers to Data with Shifted Distributions

» Learning Multilinear Representations of Distributions for Efficient Inference

» Distance Guided Classification with Gene Expression Programming

» Structure feature selection for graph classification

» Parallel OutofCore DivideandConquer Techniques with Application to Classification Trees

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2008
Where	SDM
Authors	Yasushi Sakurai, Rosalynn Chong, Lei Li, Christos Faloutsos

Comments (0)

Sciweavers

Efficient Distribution Mining and Classification

Data Mining | Distribution Classification | Multi-dimensional Points | Out-lier Clouds | SDM 2008 |

Explore & Download

Productivity Tools

Sciweavers