Bandit problems | Sciweavers

242

CORR
2012
Springer

216views Education» more CORR 2012»

14 years 2 months ago

Reinforcement learning addresses the dilemma between exploration to ﬁnd profitable actions and exploitation to act according to the best observations already made. Bandit proble...

Ananda Narayanan B., Balaraman Ravindran

claim paper

Read More »

221

click to vote

ALT
2011
Springer

259views Machine Learning» more ALT 2011»

Deviations of Stochastic Bandit Regret

14 years 6 months ago

Download certis.enpc.fr

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009...

Antoine Salomon, Jean-Yves Audibert

claim paper

Read More »

222

click to vote

AIPS
2011

233views Artificial Intelligence» more AIPS 2011»

Sample-Based Planning for Continuous Action Markov Decision Processes

14 years 10 months ago

Download www.chrismansley.com

In this paper, we present a new algorithm that integrates recent advances in solving continuous bandit problems with sample-based rollout methods for planning in Markov Decision P...

Christopher R. Mansley, Ari Weinstein, Michael L. ...

claim paper

Read More »

177

click to vote

COGSR
2011

71views more COGSR 2011»

Psychological models of human and optimal performance in bandit problems

15 years 1 months ago

Download www.socsci.uci.edu

In bandit problems, a decision-maker must choose between a set of alternatives, each of which has a ﬁxed but unknown rate of reward, to maximize their total number of rewards ov...

Michael D. Lee, Shunan Zhang, Miles Munro, Mark St...

claim paper

Read More »

193

click to vote

COLT
2010
Springer

129views Machine Learning» more COLT 2010»

Nonparametric Bandits with Covariates

15 years 4 months ago

Download www.princeton.edu

We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random cov...

Philippe Rigollet, Assaf Zeevi

claim paper

Read More »

158

click to vote

CDC
2008
IEEE

104views Control Systems» more CDC 2008»

A structured multiarmed bandit problem and the greedy policy

16 years 1 months ago

Download web.mit.edu

—We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a s...

Adam J. Mersereau, Paat Rusmevichientong, John N. ...

claim paper

Read More »

212

click to vote

Publication

222views

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

16 years 3 months ago

Download arxiv.org

Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...

Christos Dimitrakakis, Michail G. Lagoudakis

posted by olethros

Read More »

265

click to vote

Publication

334views

Rollout Sampling Approximate Policy Iteration

16 years 3 months ago

Download www.springerlink.com

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schem...

Christos Dimitrakakis, Michail G. Lagoudakis

posted by olethros

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers