Sciweavers

DATAMINE
2006

Computing LTS Regression for Large Data Sets

13 years 11 months ago
Computing LTS Regression for Large Data Sets
Least trimmed squares (LTS) regression is based on the subset of h cases (out of n) whose least squares t possesses the smallest sum of squared residuals. The coverage h may be set between n=2 and n. The LTS method was proposed by Rousseeuw (1984, p. 876) as a highly robust regression estimator, with breakdown value (n ; h)=n. It turned out that the computation time of existing LTS algorithms grew too fast with the size of the data set, precluding their use for data mining. Therefore we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call `selective iteration' and `nested extensions'. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically nds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. Moreover, FAST-LTS runs fas...
Peter Rousseeuw, Katrien van Driessen
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2006
Where DATAMINE
Authors Peter Rousseeuw, Katrien van Driessen
Comments (0)