Search Sciweavers | Sciweavers

22 search results - page 3 / 5

» Tighter Value Function Bounds for Bayesian Reinforcement Lea...

274

click to vote

NIPS
2007

207views Information Technology» more NIPS 2007»

Bayes-Adaptive POMDPs

15 years 8 months ago

Download books.nips.cc

Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learning...

Stéphane Ross, Brahim Chaib-draa, Joelle Pi...

claim paper

Read More »

182

Voted

NIPS
2001

144views Information Technology» more NIPS 2001»

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

15 years 8 months ago

Download jmlr.csail.mit.edu

Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...

Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...

claim paper

Read More »

198

Voted

JMLR
2006

153views more JMLR 2006»

Collaborative Multiagent Reinforcement Learning by Payoff Propagation

15 years 7 months ago

Download jmlr.csail.mit.edu

In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of c...

Jelle R. Kok, Nikos A. Vlassis

claim paper

Read More »

281

Voted

AI
1998
Springer

177views Artificial Intelligence» more AI 1998»

Model-Based Average Reward Reinforcement Learning

15 years 7 months ago

Download web.engr.oregonstate.edu

Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discoun...

Prasad Tadepalli, DoKyeong Ok

claim paper

Read More »

193

click to vote

ICML
2010
IEEE

167views Machine Learning» more ICML 2010»

Finite-Sample Analysis of LSTD

15 years 8 months ago

Download hal.inria.fr

In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-differe...

Alessandro Lazaric, Mohammad Ghavamzadeh, Ré...

claim paper

Read More »

« Prev « First page 3 / 5 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers