Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data m...
Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional ...
Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...
We propose a variant of Orthogonal Matching Pursuit (OMP), called LoCOMP, for scalable sparse signal approximation. The algorithm is designed for shift-invariant signal dictionari...
Practical data mining rarely falls exactly into the supervised learning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised...