Sciweavers

FLAIRS
2006

Corpus Based Unsupervised Labeling of Documents

14 years 25 days ago
Corpus Based Unsupervised Labeling of Documents
Text categorization involves mapping of documents to a fixed set of labels. A similar but equally important problem is that of assigning labels to large corpora. With a deluge of documents from sources like the World Wide Web, manual labeling by domain experts is prohibitively expensive. The problem of reducing effort in labeling of documents has warranted a lot of investigation in the past. Most of this work involved some kind of supervised or semisupervised learning. This motivates the need to find automatic methods for annotating documents with labels. In this work we explore a novel method of assigning labels to documents without using any training data. The proposed method uses clustering to build semantically related sets that are used as candidate labels to documents. This technique could be used for labeling large corpora in an unattended fashion.
Delip Rao, Deepak P, Deepak Khemani
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2006
Where FLAIRS
Authors Delip Rao, Deepak P, Deepak Khemani
Comments (0)