Abstract—We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. At each time, a player chooses K out of N (N > K) arms to play. The state of each ar...
We give an algorithm for the on-line learning of permutations. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic weight matrix, and makes...
In this paper we study the online learning problem involving rested and restless multiarmed bandits with multiple plays. The system consists of a single player/user and a set of K...
Abstract. Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In r...
The kernel Perceptron is an appealing online learning algorithm that has a drawback: whenever it makes an error it must increase its support set, which slows training and testing ...