Sciweavers

74 search results - page 6 / 15
» Regret Bounds for Gaussian Process Bandit Problems
Sort
View
CORR
2004
Springer
103views Education» more  CORR 2004»
13 years 7 months ago
Online convex optimization in the bandit setting: gradient descent without a gradient
We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2, . . . , and in each period, we choose a feasible po...
Abraham Flaxman, Adam Tauman Kalai, H. Brendan McM...
CORR
2006
Springer
83views Education» more  CORR 2006»
13 years 7 months ago
How to Beat the Adaptive Multi-Armed Bandit
The multi-armed bandit is a concise model for the problem of iterated decision-making under uncertainty. In each round, a gambler must pull one of K arms of a slot machine, withou...
Varsha Dani, Thomas P. Hayes
CORR
2007
Springer
106views Education» more  CORR 2007»
13 years 7 months ago
Bandit Algorithms for Tree Search
Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to ret...
Pierre-Arnaud Coquelin, Rémi Munos
SIGMOD
2012
ACM
210views Database» more  SIGMOD 2012»
11 years 10 months ago
Interactive regret minimization
We study the notion of regret ratio proposed in [19] to deal with multi-criteria decision making in database systems. The regret minimization query proposed in [19] was shown to h...
Danupon Nanongkai, Ashwin Lall, Atish Das Sarma, K...
TSP
2010
13 years 2 months ago
Distributed learning in multi-armed bandit with multiple players
We formulate and study a decentralized multi-armed bandit (MAB) problem. There are distributed players competing for independent arms. Each arm, when played, offers i.i.d. reward a...
Keqin Liu, Qing Zhao