Given a set of data points drawn from multiple low-dimensional linear subspaces of a high-dimensional space, we consider the problem of clustering these points according to the subspaces they belong to. Our approach exploits the fact that each data point can be written as a sparse linear combination of all the other points. When the subspaces are independent, the sparse coefficients can be found by solving a linear program. However, when the subspaces are disjoint, but not independent, the problem becomes more challenging. In this paper, we derive theoretical bounds relating the principal angles between the subspaces and the distribution of the data points across all the subspaces under which the coefficients are guaranteed to be sparse. The clustering of the data is then easily obtained from the sparse coefficients. We illustrate the validity of our results through simulation experiments.