Sciweavers

SDM
2004
SIAM

Classifying Documents Without Labels

14 years 1 months ago
Classifying Documents Without Labels
Automatic classification of documents is an important area of research with many applications in the fields of document searching, forensics and others. Methods to perform classification of text rely on the existence of a sample of documents whose class labels are known. However, in many situations, obtaining this sample may not be an easy (or even possible) task. Consider for instance, a set of documents that is returned as a result of a query. If we want to separate the documents that are truly relevant to the query from those that are not, it is unlikely that we will have at hand labelled documents to train classification models to perform this task. In this paper we focus on the classification of an unlabelled set of documents into two classes: relevant and irrelevant, given a topic of interest. By dividing the set of documents into buckets (for instance, answers returned by different search engines), and using association rule mining to find common sets of words among the buckets...
Daniel Barbará, Carlotta Domeniconi, Ning K
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where SDM
Authors Daniel Barbará, Carlotta Domeniconi, Ning Kang
Comments (0)