Clustering is a basic task in a variety of machine learning applications. Partitioning a set of input vectors into compact, wellseparated subsets can be severely affected by the presence of modelincompatible inputs called outliers. The present paper develops robust clustering algorithms for jointly partitioning the data and identifying the outliers. The novel approach relies on translating scarcity of outliers to sparsity in a judiciously defined domain, to robustify three widely used clustering schemes: hard K-means, fuzzy K-means, and probabilistic clustering. Cluster centers and assignments are iteratively updated in closed form. The developed outlieraware algorithms are guaranteed to converge, while their computational complexity is of the same order as their outlier-agnostic counterparts. Preliminary simulations validate the analytical claims.
Pedro A. Forero, Vassilis Kekatos, Georgios B. Gia