Outlier detection by sampling with accuracy guarantees

16 years 2 months ago

Download www.cise.ufl.edu

An effective approach to detect anomalous points in a data set is distance-based outlier detection. This paper describes a simple sampling algorithm to efficiently detect distancebased outliers in domains where each and every distance computation is very expensive. Unlike any existing algorithms, the sampling algorithm requires a fixed number of distance computations and can return good results with accuracy guarantees. The most computationally expensive aspect of estimating the accuracy of the result is sorting all of the distances computed by the sampling algorithm. This enables interactive-speed performance over the most expensive distance computations. The paper's algorithms were tested over two domains that require expensive distance functions as well as ten additional real data sets. The experimental study demonstrates both the efficiency and effectiveness of the sampling algorithm in comparison with the state-of-theart algorithm and the reliability of the accuracy guarante...

Mingxi Wu, Chris Jermaine

Real-time Traffic