Classification on Data with Biased Class Distribution

14 years 8 months ago

Download www.ist.temple.edu

Labeled data for classification could often be obtained by sampling that restricts or favors choice of certain classes. A classifier trained using such data will be biased, resulting in wrong inference and sub-optimal classification on new data. Given an unlabeled new data set we propose a bootstrap method to estimate its class probabilities by using an estimate of the classifier's accuracy on training data and an estimate of probabilities of classifier's predictions on new data. Then, we propose two methods to improve classification accuracy on new data. The first method can be applied only if a classifier was designed to predict posterior class probabilities where predictions of an existing classifier are adjusted according to the estimated class probabilities of new data. The second method can be applied to an arbitrary classification algorithm, but it requires retraining on the properly resampled data. The proposed bootstrap algorithm was validated through experiments wit...

Slobodan Vucetic, Zoran Obradovic

Real-time Traffic

Class Probabilities | Data Set | ECML 2001 | Machine Learning | Training Data |

claim paper

Post Info
More Details (n/a)

Added	28 Jul 2010
Updated	28 Jul 2010
Type	Conference
Year	2001
Where	ECML
Authors	Slobodan Vucetic, Zoran Obradovic

Comments (0)

Sciweavers

Classification on Data with Biased Class Distribution

Class Probabilities | Data Set | ECML 2001 | Machine Learning | Training Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers