Sciweavers

KDD
2012
ACM

A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data

12 years 2 months ago
A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data
Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest neighbor are deteriorated in high-dimensional data. Following up on the work of Kriegel et al. (KDD ’08), we investigate the use of angle-based outlier factor in mining highdimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random projection-based technique that is able to estimate the angle-based outlier factor for all data points in time nearlinear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality of approximation to guarantee the reliability of our estimation algorithm. The empirical experiments...
Ninh Pham, Rasmus Pagh
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where KDD
Authors Ninh Pham, Rasmus Pagh
Comments (0)