The best of both worlds: stochastic and adversarial bandits

12 years 10 months ago

Download www.princeton.edu

We present a bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for adversarial rewards and for stochastic rewards. Speciﬁcally, SAO combines the O( √ n) worst-case regret of Exp3 [Auer et al., 2002b] for adversarial rewards and the (poly)logarithmic regret of UCB1 [Auer et al., 2002a] for stochastic rewards. Adversarial rewards and stochastic rewards are the two main settings in the literature on (non-Bayesian) multi-armed bandits. Prior work on multiarmed bandits treats them separately, and does not attempt to jointly optimize for both. Our result falls into a general theme of achieving good worst-case performance while also taking advantage of “nice” problem instances, an important issue in the design of algorithms with partially known inputs.

Sébastien Bubeck, Aleksandrs Slivkins

Real-time Traffic

Case Performance | CORR 2012 | Education | Nice Problem | Problem Instances |

claim paper

Post Info
More Details (n/a)

Added	20 Apr 2012
Updated	20 Apr 2012
Type	Journal
Year	2012
Where	CORR
Authors	Sébastien Bubeck, Aleksandrs Slivkins

Comments (0)

Sciweavers

The best of both worlds: stochastic and adversarial bandits

Case Performance | CORR 2012 | Education | Nice Problem | Problem Instances |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers