We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ¢¤£ n of feasible points. At each time step t, the online algorithm must select a point xt ¥ S while simultaneously an adversary selects a cost vector ct ¥ £ n. The algorithm then incurs cost ct ¦ xt. Kalai and Vempala show that even if S is exponentially large (or infinite), so long as we have an efficient algorithm for the offline problem (given c ¥ £ n, find x ¥ S to minimize c ¦ x) and so long as the cost vectors are bounded, one can efficiently solve the online problem of performing nearly as well as the best fixed x ¥ S in hindsight. The Kalai-Vempala algorithm assumes that the cost vectors ct are given to the algorithm after each time step. In the “bandit” version of the problem, the algorithm only observes its cost, ct ¦ xt. Awerbuch and Kleinberg ...
H. Brendan McMahan, Avrim Blum