High-Probability Regret Bounds for Bandit Online Linear Optimization

14 years 1 months ago

Download colt2008.cs.helsinki.fi

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ( T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n3/2 ) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining highprobability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Peter L. Bartlett, Varsha Dani, Thomas P. Hayes, S

Real-time Traffic

COLT 2008 | Dani Et Al | Et Al | Linear Optimization Problem | Machine Learning |

claim paper

Post Info
More Details (n/a)

Added	18 Oct 2010
Updated	18 Oct 2010
Type	Conference
Year	2008
Where	COLT
Authors	Peter L. Bartlett, Varsha Dani, Thomas P. Hayes, Sham Kakade, Alexander Rakhlin, Ambuj Tewari

Comments (0)

Sciweavers

High-Probability Regret Bounds for Bandit Online Linear Optimization

COLT 2008 | Dani Et Al | Et Al | Linear Optimization Problem | Machine Learning |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers