Technical Update: Least-Squares Temporal Difference Learning

14 years 7 months ago

Download www.research.rutgers.edu

TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1

Justin A. Boyan

Real-time Traffic

Algorithm | Approximate Policy Evaluation | Machine Learning | ML 2002 | Value Function |

claim paper

Post Info
More Details (n/a)

Added	22 Dec 2010
Updated	22 Dec 2010
Type	Journal
Year	2002
Where	ML
Authors	Justin A. Boyan

Comments (0)

Sciweavers

Technical Update: Least-Squares Temporal Difference Learning

Algorithm | Approximate Policy Evaluation | Machine Learning | ML 2002 | Value Function |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers