In an online linear optimization problem, on each period t, an online algorithm chooses st S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adversarial) chooses a weight vector wt Rn , and the algorithm incurs cost c(st, wt), where c is a fixed cost function that is linear in the weight vector. In the full-information setting, the vector wt is then revealed to the algorithm, and in the bandit setting, only the cost experienced, c(st, wt), is revealed. The goal of the online algorithm is to perform nearly as well as the best fixed s S in hindsight. Many repeated decision-making problems with weights fit naturally into this framework, such as online shortest-path, online TSP, online clustering, and online weighted set cover. Previously, it was shown how to convert any efficient exact offline optimization algorithm for such a problem into an efficient online bandit algorithm in both the full-information and the bandit settings, with average cost nearl...
Sham M. Kakade, Adam Tauman Kalai, Katrina Ligett