A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization

15 years 12 months ago

Download www.insun.hit.edu.cn

Abstract. This paper proposes the use of Latent Semantic Indexing (LSI) techniques, decomposed with semi-discrete matrix decomposition (SDD) method, for text categorization. The SDD algorithm is a recent solution to LSI, which can achieve similar performance at a much lower storage cost. In this paper, LSI is used for text categorization by constructing new features of category as combinations or transformations of the original features. In the experiments on data set of Chinese Library Classification we compare accuracy to a classifier based on k-Nearest Neighbor (k-NN) and the result shows that k-NN based on LSI is sometimes significantly better. Much future work remains, but the results indicate that LSI is a promising technique for text categorization.

Qiang Wang, Xiaolong Wang, Guan Yi

Real-time Traffic

IJCNLP 2004 | Latent Semantic Indexing | Semi-discrete Matrix Decomposition | Text Categorization |

claim paper

Post Info
More Details (n/a)

Added	02 Jul 2010
Updated	02 Jul 2010
Type	Conference
Year	2004
Where	IJCNLP
Authors	Qiang Wang, Xiaolong Wang, Guan Yi

Comments (0)

Sciweavers

A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization

IJCNLP 2004 | Latent Semantic Indexing | Semi-discrete Matrix Decomposition | Text Categorization |

Explore & Download

Productivity Tools

Sciweavers