An “active learning system” will sequentially decide which unlabeled instance to label, with the goal of efficiently gathering the information necessary to produce a good classifier. Some such systems greedily select the next instance based only on properties of that instance and the few currently labeled points — e.g., selecting the one closest to the current classification boundary. Unfortunately, these approaches ignore the valuable information contained in the other unlabeled instances, which can help identify a good classifier much faster. For the previous approaches that do exploit this unlabeled data, this information is mostly used in a conservative way. One common property of the approaches in the literature is that the active learner sticks to one single query selection criterion in the whole process. We propose a system, MM+M, that selects the query instance that is able to provide the maximum conditional mutual information about the labels of the unlabeled instan...