Active learning aims to reduce the amount of labels required for classification. The main difficulty is to find a good trade-off between exploration and exploitation of the labeling process that depends – among other things – on the classification task, the distribution of the data and the employed classification scheme. In this paper, we analyze different sampling criteria including a novel density-based criteria and demonstrate the importance to combine exploration and exploitation sampling criteria. We also show that a time-varying combination of sampling criteria often improves performance. Finally, by formulating the criteria selection as a Markov decision process, we propose a novel feedback-driven framework based on reinforcement learning. Our method does not require prior information on the dataset or the sampling criteria but rather is able to adapt the sampling strategy during the learning process by experience. We evaluate our approach on three challenging object r...