Optimal outlier removal in high-dimensional

15 years 1 months ago

Download research.microsoft.com

We study the problem of finding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. As in [BFKV 99], a point x is defined to be a -outlier if there exists some direction w in which its squared distance from the mean along w is greater than times the average squared distance from the mean along w. Our main theorem is that for any > 0, there exists a (1- ) fraction of the original distribution that has no O(n (b+log n ))-outliers, improving on the previous bound of O(n7 b/ ). This is asymptotically the best possible, as shown by a matching lower bound. The theorem is constructive, and results in a 1 1- approximation to the following optimization problem: given a distribution ? (i.e. the ability to sample from it), and a parameter > 0, find the minimum for which there exists a subset of probability at least (1 - ) with no -outliers.

John Dunagan, Santosh Vempala

Real-time Traffic

Algorithms | Average Squared Distance | N-dimensional Euclidean Space | Probability Distribution | STOC 2001 |

claim paper

Post Info
More Details (n/a)

Added	03 Dec 2009
Updated	03 Dec 2009
Type	Conference
Year	2001
Where	STOC
Authors	John Dunagan, Santosh Vempala

Comments (0)

Sciweavers

Optimal outlier removal in high-dimensional

Algorithms | Average Squared Distance | N-dimensional Euclidean Space | Probability Distribution | STOC 2001 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers