Sciweavers

178 search results - page 13 / 36
» Probabilistic policy reuse in a reinforcement learning agent
Sort
View
ATAL
2008
Springer
13 years 10 months ago
Sigma point policy iteration
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
AGI
2011
13 years 7 days ago
Reinforcement Learning and the Bayesian Control Rule
We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intra...
Pedro Alejandro Ortega, Daniel Alexander Braun, Si...
NIPS
2004
13 years 10 months ago
Multi-agent Cooperation in Diverse Population Games
We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the pr...
K. Y. Michael Wong, S. W. Lim, Zhuo Gao
NIPS
1993
13 years 10 months ago
Robust Reinforcement Learning in Motion Planning
While exploring to nd better solutions, an agent performing online reinforcement learning (RL) can perform worse than is acceptable. In some cases, exploration might have unsafe, ...
Satinder P. Singh, Andrew G. Barto, Roderic A. Gru...
ICML
2006
IEEE
14 years 9 months ago
An intrinsic reward mechanism for efficient exploration
How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exp...
Özgür Simsek, Andrew G. Barto