Sciweavers

DATAMINE
2006

Accelerated EM-based clustering of large data sets

14 years 14 days ago
Accelerated EM-based clustering of large data sets
Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive an accelerated variant of the EM algorithm for Gaussian mixtures that: (1) offers speedups that are at least linear in the number of data points, (2) ensures convergence by strictly increasing a lower bound on the data log-likelihood in each learning step, and (3) allows ample freedom in the design of other accelerated variants. We also derive a similar accelerated algorithm for greedy mixture learning, where very satisfactory results are obtained. The core idea is to define a lower bound on the data log-likelihood based on a grouping of data points. The bound is maximized by computing in turn (i) optimal assignments of groups of data points to the mixture components, and (ii) optimal reestimation of the model parameters based on average sufficient statistics computed over groups of dat...
Jakob J. Verbeek, Jan Nunnink, Nikos A. Vlassis
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2006
Where DATAMINE
Authors Jakob J. Verbeek, Jan Nunnink, Nikos A. Vlassis
Comments (0)