Sciweavers

SIGIR
1995
ACM

Noise Reduction in a Statistical Approach to Text Categorization

14 years 4 months ago
Noise Reduction in a Statistical Approach to Text Categorization
This paper studies noise reduction for computational efficiency improvements in a statistical learning method for text categorization, the Linear Least Squares Fit (LLSF) mapping. Multiple noise reduction strategies are proposedand evaluated,including: an aggressive removal of “non-informative words” from texts before training; the use of a truncated singular value decomposition to cut off noisy “latentsemantic structures” during training;the elimination of non-influential components in the LLSF solution (a word-concept association matrix) after training. Text collections in different domains were used for evaluation. Significant improvements in computational efficiency without losing categorization accuracy were evident in the testing results.
Yiming Yang
Added 26 Aug 2010
Updated 26 Aug 2010
Type Conference
Year 1995
Where SIGIR
Authors Yiming Yang
Comments (0)