Learning to Identify Unexpected Instances in the Test Set

15 years 8 months ago

Download www.cs.uic.edu

Traditional classification involves building a classifier using labeled training examples from a set of predefined classes and then applying the classifier to classify test instances into the same set of classes. In practice, this paradigm can be problematic because the test data may contain instances that do not belong to any of the previously defined classes. Detecting such unexpected instances in the test set is an important issue in practice. The problem can be formulated as learning from positive and unlabeled examples (PU learning). However, current PU learning algorithms require a large proportion of negative instances in the unlabeled set to be effective. This paper proposes a novel technique to solve this problem in the text classification domain. The technique first generates a single artificial negative document AN. The sets P and {AN} are then used to build a naïve Bayesian classifier. Our experiment results show that this method is significantly better than existing tech...

Xiaoli Li, Bing Liu, See-Kiong Ng

Real-time Traffic