Sciweavers

66 search results - page 5 / 14
» The Nonstochastic Multiarmed Bandit Problem
Sort
View
ML
2002
ACM
133views Machine Learning» more  ML 2002»
13 years 7 months ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...
COLT
2004
Springer
13 years 11 months ago
The Budgeted Multi-armed Bandit Problem
straction of the following scenarios: choosing from among a set of alternative treatments after a fixed number of clinical trials, determining the best parameter settings for a pro...
Omid Madani, Daniel J. Lizotte, Russell Greiner
ALT
2011
Springer
12 years 7 months ago
Deviations of Stochastic Bandit Regret
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009...
Antoine Salomon, Jean-Yves Audibert
CORR
2011
Springer
202views Education» more  CORR 2011»
13 years 2 months ago
Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many stochastic sequential decision-making problems. We employ tools from the self-normalized processes to provi...
Yasin Abbasi-Yadkori, Dávid Pál, Csa...
FSTTCS
2010
Springer
13 years 5 months ago
Playing in stochastic environment: from multi-armed bandits to two-player games
Given a zero-sum infinite game we examine the question if players have optimal memoryless deterministic strategies. It turns out that under some general conditions the problem for...
Wieslaw Zielonka