This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the n data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1 penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The L1 penalty corresponds to soft thresholding. We introduce a thresholding (denoted by ) based iterative procedure for outlier detection (-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that -IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most O(np) (and sometimes much less) avoiding an O(np2) least squares estimate. We describe the connection between -IPOD and M-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-...
Yiyuan She, Art B. Owen