Sciweavers

96
Voted
CORR
2008
Springer
64views Education» more  CORR 2008»
15 years 3 months ago
Linearly Parameterized Bandits
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vect...
Paat Rusmevichientong, John N. Tsitsiklis
177
Voted
COLT
2007
Springer
15 years 10 months ago
Regret to the Best vs. Regret to the Average
Abstract. We study online regret minimization algorithms in a bicriteria setting, examining not only the standard notion of regret to the best expert, but also the regret to the av...
Eyal Even-Dar, Michael J. Kearns, Yishay Mansour, ...
113
Voted
ALT
2009
Springer
16 years 20 days ago
Pure Exploration in Multi-armed Bandits Problems
Abstract. We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The stra...
Sébastien Bubeck, Rémi Munos, Gilles...