We extend the "Sparse LDA" algorithm of [7] with new sparsity bounds on 2-class separability and efficient partitioned matrix inverse techniques leading to 1000-fold speed-ups. This mitigates the O(n4 ) scaling that has limited this algorithm's applicability to vision problems and also prioritizes the less-myopic backward elimination stage by making it faster than forward selection. Experiments include "sparse eigenfaces" and gender classification on FERET data as well as pixel/part selection for OCR on MNIST data using Bayesian (GP) classification. SparseLDA is an attractive alternative to the more demanding Automatic Relevance Determination. State-of-the-art recognition is obtained while discarding the majority of pixels in all experiments. Our sparse models also show a better fit to data in terms of the "evidence" or marginal likelihood.