Sciweavers

81 search results - page 15 / 17
» The Optimal Reward Baseline for Gradient-Based Reinforcement...
Sort
View
ICML
2009
IEEE
14 years 8 months ago
Near-Bayesian exploration in polynomial time
We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering...
J. Zico Kolter, Andrew Y. Ng
UAI
2003
13 years 9 months ago
On the Convergence of Bound Optimization Algorithms
Many practitioners who use EM and related algorithms complain that they are sometimes slow. When does this happen, and what can be done about it? In this paper, we study the gener...
Ruslan Salakhutdinov, Sam T. Roweis, Zoubin Ghahra...
NIPS
2008
13 years 9 months ago
Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms
Policy gradient (PG) reinforcement learning algorithms have strong (local) convergence guarantees, but their learning performance is typically limited by a large variance in the e...
John W. Roberts, Russ Tedrake
AAAI
2010
13 years 9 months ago
Towards Multiagent Meta-level Control
Embedded systems consisting of collaborating agents capable of interacting with their environment are becoming ubiquitous. It is crucial for these systems to be able to adapt to t...
Shanjun Cheng, Anita Raja, Victor R. Lesser
IJCAI
2007
13 years 9 months ago
Using Linear Programming for Bayesian Exploration in Markov Decision Processes
A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much ...
Pablo Samuel Castro, Doina Precup