Sciweavers

DASFAA
2004
IEEE

Semi-supervised Text Classification Using Partitioned EM

14 years 2 months ago
Semi-supervised Text Classification Using Partitioned EM
Text classification using a small labeled set and a large unlabeled data is seen as a promising technique to reduce the labor-intensive and time consuming effort of labeling training data in order to build accurate classifiers since unlabeled data is easy to get from the Web. In [16] it has been demonstrated that an unlabeled set improves classification accuracy significantly with only a small labeled training set. However, the Bayesian method used in [16] assumes that text documents are generated from a mixture model and there is a one-to-one correspondence between the mixture components and the classes. This may not be the case in many applications. In many real-life applications, a class may cover documents from many different topics, which violates the oneto-one correspondence assumption. In such cases, the resulting classifiers can be quite poor. In this paper, we propose a clustering based partitioning technique to solve the problem. This method first partitions the training docu...
Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where DASFAA
Authors Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu
Comments (0)