Sciweavers

STOC
2001
ACM

Optimal outlier removal in high-dimensional

15 years 1 months ago
Optimal outlier removal in high-dimensional
We study the problem of finding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. As in [BFKV 99], a point x is defined to be a -outlier if there exists some direction w in which its squared distance from the mean along w is greater than times the average squared distance from the mean along w. Our main theorem is that for any > 0, there exists a (1- ) fraction of the original distribution that has no O(n (b+log n ))-outliers, improving on the previous bound of O(n7 b/ ). This is asymptotically the best possible, as shown by a matching lower bound. The theorem is constructive, and results in a 1 1- approximation to the following optimization problem: given a distribution ? (i.e. the ability to sample from it), and a parameter > 0, find the minimum for which there exists a subset of probability at least (1 - ) with no -outliers.
John Dunagan, Santosh Vempala
Added 03 Dec 2009
Updated 03 Dec 2009
Type Conference
Year 2001
Where STOC
Authors John Dunagan, Santosh Vempala
Comments (0)