Sciweavers

22 search results - page 3 / 5
» High-Probability Regret Bounds for Bandit Online Linear Opti...
Sort
View
CORR
2012
Springer
210views Education» more  CORR 2012»
12 years 3 months ago
Towards minimax policies for online linear optimization with bandit feedback
We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of...
Sébastien Bubeck, Nicolò Cesa-Bianch...
COLT
2008
Springer
13 years 9 months ago
Regret Bounds for Sleeping Experts and Bandits
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...
CORR
2007
Springer
106views Education» more  CORR 2007»
13 years 7 months ago
Bandit Algorithms for Tree Search
Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to ret...
Pierre-Arnaud Coquelin, Rémi Munos
COLT
2008
Springer
13 years 9 months ago
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Prediction from expert advice is a fundamental problem in machine learning. A major pillar of the field is the existence of learning algorithms whose average loss approaches that ...
Elad Hazan, Satyen Kale
CORR
2010
Springer
171views Education» more  CORR 2010»
13 years 2 months ago
Online Learning in Opportunistic Spectrum Access: A Restless Bandit Approach
We consider an opportunistic spectrum access (OSA) problem where the time-varying condition of each channel (e.g., as a result of random fading or certain primary users' activ...
Cem Tekin, Mingyan Liu