Sciweavers

FLAIRS
2006

Using Web Searches on Important Words to Create Background Sets for LSI Classification

14 years 25 days ago
Using Web Searches on Important Words to Create Background Sets for LSI Classification
The world wide web has a wealth of information that is related to almost any text classification task. This paper presents a method for mining the web to improve text classification, by creating a background text set. Our algorithm uses the information gain criterion to create lists of important words for each class of a text categorization problem. It then searches the web on various combinations of these words to produce a set of related data. We use this set of background text with Latent Semantic Indexing classification to create an expanded term by document matrix on which singular value decomposition is done. We provide empirical results that this approach improves accuracy on unseen test examples in different domains.
Sarah Zelikovitz, Marina Kogan
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2006
Where FLAIRS
Authors Sarah Zelikovitz, Marina Kogan
Comments (0)