Sciweavers

13 search results - page 2 / 3
» On the Combinatorial Multi-Armed Bandit Problem with Markovi...
Sort
View
COLT
2010
Springer
13 years 4 months ago
Best Arm Identification in Multi-Armed Bandits
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optim...
Jean-Yves Audibert, Sébastien Bubeck, R&eac...
CORR
2010
Springer
143views Education» more  CORR 2010»
13 years 3 months ago
The Non-Bayesian Restless Multi-Armed Bandit: a Case of Near-Logarithmic Regret
In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A play...
Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Z...
FOCS
2007
IEEE
14 years 1 months ago
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
Sudipto Guha, Kamesh Munagala
GECCO
2010
Springer
191views Optimization» more  GECCO 2010»
13 years 11 months ago
Toward comparison-based adaptive operator selection
Adaptive Operator Selection (AOS) turns the impacts of the applications of variation operators into Operator Selection through a Credit Assignment mechanism. However, most Credit ...
Álvaro Fialho, Marc Schoenauer, Michè...
ICASSP
2011
IEEE
12 years 10 months ago
Logarithmic weak regret of non-Bayesian restless multi-armed bandit
Abstract—We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. At each time, a player chooses K out of N (N > K) arms to play. The state of each ar...
Haoyang Liu, Keqin Liu, Qing Zhao