Sciweavers

CORR
2008
Springer
64views Education» more  CORR 2008»
13 years 11 months ago
Linearly Parameterized Bandits
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vect...
Paat Rusmevichientong, John N. Tsitsiklis
COLT
2007
Springer
14 years 5 months ago
Regret to the Best vs. Regret to the Average
Abstract. We study online regret minimization algorithms in a bicriteria setting, examining not only the standard notion of regret to the best expert, but also the regret to the av...
Eyal Even-Dar, Michael J. Kearns, Yishay Mansour, ...
ALT
2009
Springer
14 years 8 months ago
Pure Exploration in Multi-armed Bandits Problems
Abstract. We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The stra...
Sébastien Bubeck, Rémi Munos, Gilles...