We consider the problem of learning mixtures of distributions via spectral methods and derive a tight characterization of when such methods are useful. Specifically, given a mixture-sample, let µi, Ci, wi denote the empirical mean, covariance matrix, and mixing weight of the i-th component. We prove that a very simple algorithm, namely spectral projection followed by single-linkage clustering, properly classifies every point in the sample when each µi is separated from all µj by Ci 2(1/wi+1/wj)1/2 plus a term that depends on the concentration properties of the distributions in the mixture. This second term is very small for many distributions, including Gaussians, Log-concave, and many others. As a result, we get the best known bounds for learning mixtures of arbitrary Gaussians in terms of the required mean separation. On the other hand, we prove that given any k means µi and mixing weights wi, there are (many) sets of matrices Ci such that each µi is separated from all µj by...