For many applied problems in the context of clustering via mixture models, the estimates of the component means and covariance matrices can be affected by observations that are atypical of the components in the mixture model being fitted. In this paper, we consider for Gaussian mixtures a robust estimation procedure using multiresolution kd-trees. The method provides a fast EM-based approach to the fitting of Gaussian mixtures in applications to huge data sets. In addition, a robust estimation against outliers in fitting Gaussian mixtures is achieved by giving reduced weight to observations that are atypical of a component. The method is illustrated using real and simulated data.
Shu-Kay Ng, Geoffrey J. McLachlan