Sciweavers

ICDM
2009
IEEE

Scalable Algorithms for Distribution Search

14 years 6 months ago
Scalable Algorithms for Distribution Search
Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to find similar clouds (i.e., distributions), to discover patterns, rules and outlier clouds. For example, consider the numerical case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution of 2-d points (one for each item he/she bought). We want to find similar users, e.g., for market segmentation, anomaly/fraud detection. We propose to address this problem and present DSearch, which includes fast and effective algorithms for similarity search in large distribution datasets. Our main contributions are (1) approximate KL divergence, which can speed up cloud-similarity computations, (2) multi-step sequential scan...
Yasuko Matsubara, Yasushi Sakurai, Masatoshi Yoshi
Added 23 May 2010
Updated 23 May 2010
Type Conference
Year 2009
Where ICDM
Authors Yasuko Matsubara, Yasushi Sakurai, Masatoshi Yoshikawa
Comments (0)