Sciweavers

27 search results - page 4 / 6
» Policy Gradient Method for Team Markov Games
Sort
View
CDC
2010
IEEE
136views Control Systems» more  CDC 2010»
13 years 4 months ago
Pathologies of temporal difference methods in approximate dynamic programming
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...
Dimitri P. Bertsekas
IROS
2007
IEEE
132views Robotics» more  IROS 2007»
14 years 4 months ago
Hysteretic q-learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams
— Multi-agent systems (MAS) are a field of study of growing interest in a variety of domains such as robotics or distributed controls. The article focuses on decentralized reinf...
Laëtitia Matignon, Guillaume J. Laurent, Nadi...
ML
2006
ACM
13 years 9 months ago
Universal parameter optimisation in games based on SPSA
Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use ge...
Levente Kocsis, Csaba Szepesvári
ICML
2001
IEEE
14 years 10 months ago
Off-Policy Temporal Difference Learning with Function Approximation
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta
GECCO
2009
Springer
162views Optimization» more  GECCO 2009»
13 years 7 months ago
Uncertainty handling CMA-ES for reinforcement learning
The covariance matrix adaptation evolution strategy (CMAES) has proven to be a powerful method for reinforcement learning (RL). Recently, the CMA-ES has been augmented with an ada...
Verena Heidrich-Meisner, Christian Igel