Sciweavers

66 search results - page 8 / 14
» The Nonstochastic Multiarmed Bandit Problem
Sort
View
NIPS
2008
13 years 9 months ago
Algorithms for Infinitely Many-Armed Bandits
We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new sel...
Yizao Wang, Jean-Yves Audibert, Rémi Munos
ALT
2006
Springer
13 years 11 months ago
Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring
In this paper the sequential prediction problem with expert advice is considered when the loss is unbounded under partial monitoring scenarios. We deal with a wide class of the par...
Chamy Allenberg, Peter Auer, László ...
CDC
2009
IEEE
123views Control Systems» more  CDC 2009»
14 years 8 days ago
On the myopic policy for a class of restless bandit problems with applications in dynamic multichannel access
We consider a class of restless multi-armed bandit problems that arises in multi-channel opportunistic communications, where channels are modeled as independent and stochastically...
Keqin Liu, Qing Zhao
COLT
2008
Springer
13 years 9 months ago
Regret Bounds for Sleeping Experts and Bandits
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...

Publication
334views
14 years 4 months ago
Rollout Sampling Approximate Policy Iteration
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schem...
Christos Dimitrakakis, Michail G. Lagoudakis