Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large scale text corpus due to its high computational complexity. In this paper, we propose a straightforward feature selection strategy, which is named as Feature Selection for Latent Semantic Indexing (FSLSI), as a preprocessing step such that LSI can be efficiently approximated on large scale text corpus. We formulate LSI as a continuous optimization problem and propose to optimize its objective function in terms of discrete optimization, which leads to the FSLSI algorithm. We show that the closed form solution of this optimization is as simple as scoring each feature by Frobenius norm and filter out the ones with small scores. Theoretical analysis guarantees the loss of the features filtered out by FSLSI algorithm is minimized for approximating LSI. Thus we offer a general way for studying and applying LSI on larg...