In this paper we address the problem of selecting variables or features in a regression model in the presence of both additive (vertical) and leverage outliers. Since variable selection and the detection of anomalous data are not separable problems, we focus on methods that select variables and outliers simultaneously. For selection, we use the fast forward selection algorithm, LARS, which is not robust. To achieve robustness to additive outliers, we append a dummy variable identity matrix to the design matrix and allow both real variables and additive outliers to be in the selection set. For leverage outliers, we use these selection methods on samples of elemental sets in a manner similar to that used in high breakdown robust estimation. Bagging is then used to stabilize the selection results. We conclude by comparing our results to several other selection methods of varying computational complexity and robustness and discussing the extension of our methods to situations where the nu...
Lauren McCann, Roy E. Welsch