Straightforward Feature Selection for Scalable Latent Semantic Indexing.

14 years 8 months ago

Download www.siam.org

Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large scale text corpus due to its high computational complexity. In this paper, we propose a straightforward feature selection strategy, which is named as Feature Selection for Latent Semantic Indexing (FSLSI), as a preprocessing step such that LSI can be efficiently approximated on large scale text corpus. We formulate LSI as a continuous optimization problem and propose to optimize its objective function in terms of discrete optimization, which leads to the FSLSI algorithm. We show that the closed form solution of this optimization is as simple as scoring each feature by Frobenius norm and filter out the ones with small scores. Theoretical analysis guarantees the loss of the features filtered out by FSLSI algorithm is minimized for approximating LSI. Thus we offer a general way for studying and applying LSI on larg...

Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen

Real-time Traffic

Computer Science | Large Scale Corpus | Large Scale Text | Scale Text Corpus | SDM 2009 |

claim paper

Post Info
More Details (n/a)

Added	07 Mar 2010
Updated	07 Mar 2010
Type	Conference
Year	2009
Where	SDM
Authors	Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen

Comments (0)

Sciweavers

Straightforward Feature Selection for Scalable Latent Semantic Indexing.

Computer Science | Large Scale Corpus | Large Scale Text | Scale Text Corpus | SDM 2009 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers