Sciweavers

SDM
2009
SIAM

Straightforward Feature Selection for Scalable Latent Semantic Indexing.

14 years 9 months ago
Straightforward Feature Selection for Scalable Latent Semantic Indexing.
Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large scale text corpus due to its high computational complexity. In this paper, we propose a straightforward feature selection strategy, which is named as Feature Selection for Latent Semantic Indexing (FSLSI), as a preprocessing step such that LSI can be efficiently approximated on large scale text corpus. We formulate LSI as a continuous optimization problem and propose to optimize its objective function in terms of discrete optimization, which leads to the FSLSI algorithm. We show that the closed form solution of this optimization is as simple as scoring each feature by Frobenius norm and filter out the ones with small scores. Theoretical analysis guarantees the loss of the features filtered out by FSLSI algorithm is minimized for approximating LSI. Thus we offer a general way for studying and applying LSI on larg...
Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen
Added 07 Mar 2010
Updated 07 Mar 2010
Type Conference
Year 2009
Where SDM
Authors Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen
Comments (0)