Using Web Searches on Important Words to Create Background Sets for LSI Classification

14 years 5 months ago

Download www.cs.csi.cuny.edu

The world wide web has a wealth of information that is related to almost any text classification task. This paper presents a method for mining the web to improve text classification, by creating a background text set. Our algorithm uses the information gain criterion to create lists of important words for each class of a text categorization problem. It then searches the web on various combinations of these words to produce a set of related data. We use this set of background text with Latent Semantic Indexing classification to create an expanded term by document matrix on which singular value decomposition is done. We provide empirical results that this approach improves accuracy on unseen test examples in different domains.

Sarah Zelikovitz, Marina Kogan

Real-time Traffic

Artificial Intelligence | Background Text | FLAIRS 2006 | Text Classification | Text Classification Task |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2006
Where	FLAIRS
Authors	Sarah Zelikovitz, Marina Kogan

Comments (0)

Sciweavers

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Artificial Intelligence | Background Text | FLAIRS 2006 | Text Classification | Text Classification Task |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers