We show a close relationship between the Expectation - Maximization (EM) algorithm and direct optimization algorithms such as gradientbased methods for parameter learning. We identify analytic conditions under which EM exhibits Newton-like behavior, and conditions under which it possesses poor, first-order convergence. Based on this analysis, we propose two novel algorithms for maximum likelihood estimation of latent variable models, and report empirical results showing that, as predicted by theory, the proposed new algorithms can substantially outperform standard EM in terms of speed of convergence in certain cases.
Ruslan Salakhutdinov, Sam T. Roweis, Zoubin Ghahra