Search Sciweavers | Sciweavers

12 search results - page 1 / 3

» Finite-time Analysis of the Multiarmed Bandit Problem

187

Voted

JMLR
2012

165views Programming Languages» more JMLR 2012»

PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits

13 years 9 months ago

Download homes.di.unimi.it

We develop a new tool for data-dependent analysis of the exploration-exploitation trade-oﬀ in learning under limited feedback. Our tool is based on two main ingredients. The ﬁ...

Yevgeny Seldin, Nicolò Cesa-Bianchi, Peter ...

claim paper

Read More »

162

Voted

ML
2002
ACM

133views Machine Learning» more ML 2002»

Finite-time Analysis of the Multiarmed Bandit Problem

15 years 6 months ago

Download homes.dsi.unimi.it

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...

Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...

claim paper

Read More »

199

click to vote

CORR
2011
Springer

202views Education» more CORR 2011»

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

15 years 1 months ago

Download www.ualberta.ca

The analysis of online least squares estimation is at the heart of many stochastic sequential decision-making problems. We employ tools from the self-normalized processes to provi...

Yasin Abbasi-Yadkori, Dávid Pál, Csa...

claim paper

Read More »

200

click to vote

COLT
2010
Springer

191views Machine Learning» more COLT 2010»

Best Arm Identification in Multi-Armed Bandits

15 years 4 months ago

Download www.di.ens.fr

We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optim...

Jean-Yves Audibert, Sébastien Bubeck, R&eac...

claim paper

Read More »

197

click to vote

COLT
2010
Springer

207views Machine Learning» more COLT 2010»

An Asymptotically Optimal Bandit Algorithm for Bounded Support Models

15 years 4 months ago

Download www.colt2010.org

Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...

Junya Honda, Akimichi Takemura

claim paper

Read More »

« Prev « First page 1 / 3 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers