We show that, given data from a mixture of k well-separated spherical Gaussians in Rd, a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussians to nearoptimal precision, if the dimension is high (d lnk). We relate this to previous theoretical and empirical work on the EM algorithm.
Sanjoy Dasgupta, Leonard J. Schulman