Tuning Bandit Algorithms in Stochastic Environments

16 years 3 months ago

Download www.sztaki.hu

Algorithms based on upper-conﬁdence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, eﬃcient and eﬀective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the diﬀerent arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these ﬁndings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the ﬁrst time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoﬀs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except some very special bandit problems, for upper conﬁdence bound based a...

Jean-Yves Audibert, Rémi Munos, Csaba Szepe

Real-time Traffic

ALT 2007 | Logarithmic Cumulative Regret | Logarithmic Expected Regret | Machine Learning | Variance Estimates |

claim paper

» Adapting to a Changing Environment the Brownian Restless Bandits

» Best Arm Identification in MultiArmed Bandits

» An approach to online optimization of heuristic coordination algorithms

» Open Loop Optimistic Planning

Post Info
More Details (n/a)

Added	14 Mar 2010
Updated	14 Mar 2010
Type	Conference
Year	2007
Where	ALT
Authors	Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári

Comments (0)

Sciweavers

Tuning Bandit Algorithms in Stochastic Environments

ALT 2007 | Logarithmic Cumulative Regret | Logarithmic Expected Regret | Machine Learning | Variance Estimates |

Explore & Download

Productivity Tools

Sciweavers