Sciweavers

ALT
2007
Springer

Tuning Bandit Algorithms in Stochastic Environments

14 years 9 months ago
Tuning Bandit Algorithms in Stochastic Environments
Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except some very special bandit problems, for upper confidence bound based a...
Jean-Yves Audibert, Rémi Munos, Csaba Szepe
Added 14 Mar 2010
Updated 14 Mar 2010
Type Conference
Year 2007
Where ALT
Authors Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári
Comments (0)