Sciweavers

COLT
2008
Springer

High-Probability Regret Bounds for Bandit Online Linear Optimization

14 years 1 months ago
High-Probability Regret Bounds for Bandit Online Linear Optimization
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ( T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n3/2 ) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining highprobability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.
Peter L. Bartlett, Varsha Dani, Thomas P. Hayes, S
Added 18 Oct 2010
Updated 18 Oct 2010
Type Conference
Year 2008
Where COLT
Authors Peter L. Bartlett, Varsha Dani, Thomas P. Hayes, Sham Kakade, Alexander Rakhlin, Ambuj Tewari
Comments (0)