We provide a framework to exploit dependencies among arms in multi-armed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find ...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vect...
In standard online learning, the goal of the learner is to maintain an average loss that is "not too big" compared to the loss of the best-performing function in a fixed...
This paper examines the generalization properties of online convex programming algorithms when the loss function is Lipschitz and strongly convex. Our main result is a sharp bound...
We describe online algorithms for learning a rotation from pairs of unit vectors in Rn . We show that the expected regret of our online algorithm compared to the best fixed rotati...