Active learning for logistic regression: an evaluation

15 years 1 months ago

Download www.andrewschein.com

Which active learning methods can we expect to yield good performance in learning binary and multi-category logistic regression classiﬁers? Addressing this question is a natural ﬁrst step in providing robust solutions for active learning across a wide variety of exponential models including maximum entropy, generalized linear, log-linear, and conditional random ﬁeld models. For the logistic regression model we re-derive the variance reduction method known in experimental design circles as ‘A-optimality.’ We then run comparisons against diﬀerent variations of the most widely used heuristic schemes: query by committee and uncertainty sampling, to discover which methods work best for diﬀerent classes of problems and why. We ﬁnd that among the strategies tested, the experimental design methods are most likely to match or beat a random sample baseline. The heuristic alternatives produced mixed results, with an uncertainty sampling variant called margin sampling and a derivat...

Andrew I. Schein, Lyle H. Ungar

Real-time Traffic