Sciweavers

49 search results - page 4 / 10
» Temporal Difference and Policy Search Methods for Reinforcem...
Sort
View
ATAL
2008
Springer
13 years 9 months ago
Sigma point policy iteration
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
ICML
2003
IEEE
14 years 22 days ago
The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy
Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of ...
Clifford Kotnik, Jugal K. Kalita
CDC
2010
IEEE
136views Control Systems» more  CDC 2010»
13 years 2 months ago
Pathologies of temporal difference methods in approximate dynamic programming
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...
Dimitri P. Bertsekas
ICML
2010
IEEE
13 years 5 months ago
Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of "bootstrapped" return estimates to make effi...
Carlton Downey, Scott Sanner
CORR
2010
Springer
204views Education» more  CORR 2010»
13 years 6 months ago
Predictive State Temporal Difference Learning
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications...
Byron Boots, Geoffrey J. Gordon