Abstract. We analyze the expected cost of a greedy active learning algorithm. Our analysis extends previous work to a more general setting in which different queries have different costs. Moreover, queries may have more than two possible responses and the distribution over hypotheses may be non uniform. Specific applications include active learning with label costs, active learning for multiclass and partial label queries, and batch mode active learning. We also discuss an approximate version of interest when there are very many queries. 1 Motivation We first motivate the problem by describing it informally. Imagine two people are playing a variation of twenty questions. Player 1 selects an object from a finite set, and it is up to player 2 to identify the selected object by asking questions chosen from a finite set. We assume for every object and every question the answer is unambiguous: each question maps each object to a single answer. Furthermore, each question has associated...
Andrew Guillory, Jeff A. Bilmes