Exploiting Unlabeled Data for Improving Accuracy of Predictive Data Mining

15 years 11 months ago

Download www.ist.temple.edu

Predictive data mining typically relies on labeled data without exploiting a much larger amount of available unlabeled data. The goal of this paper is to show that using unlabeled data can be beneficial in a range of important prediction problems and therefore should be an integral part of the learning process. Given an unlabeled dataset representative of the underlying distribution and a K-class labeled sample that might be biased, our approach is to learn K contrast classifiers each trained to discriminate a certain class of labeled data from the unlabeled population. We illustrate that contrast classifiers can be useful in one-class classification, outlier detection, density estimation, and learning from biased data. The advantages of the proposed approach are demonstrated by an extensive evaluation on synthetic data followed by real-life bioinformatics applications for (1) ranking PubMed articles by their relevance to protein disorder and (2) cost-effective enlargement of a disord...

Kang Peng, Slobodan Vucetic, Bo Han, Hongbo Xie, Z

Real-time Traffic

Available Unlabeled Data | Contrast Classifiers | Data Mining | ICDM 2003 | Unlabeled Data |

claim paper

» Semisupervised Prediction of Protein Interaction Sentences Exploiting Semantically Encoded...

» Towards Cooperative Predictive Data Mining in Competitive Environments

» On the Role of Local Matching for Efficient Semisupervised Protein Sequence Classification

» Data mining techniques to improve forecast accuracy in airline business

» Combining Data and Text Mining Techniques for Yeast Gene Regulation Prediction A Case Stud...

» Combining clustering and cotraining to enhance text classification using unlabelled data

» A New Data Selection Principle for SemiSupervised Incremental Learning

» CBC Clustering Based Text Classification Requiring Minimal Labeled Data

Post Info
More Details (n/a)

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDM
Authors	Kang Peng, Slobodan Vucetic, Bo Han, Hongbo Xie, Zoran Obradovic

Comments (0)

Sciweavers

Exploiting Unlabeled Data for Improving Accuracy of Predictive Data Mining

Available Unlabeled Data | Contrast Classifiers | Data Mining | ICDM 2003 | Unlabeled Data |

Explore & Download

Productivity Tools

Sciweavers