Sciweavers

ML
2002
ACM

Technical Update: Least-Squares Temporal Difference Learning

13 years 11 months ago
Technical Update: Least-Squares Temporal Difference Learning
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1
Justin A. Boyan
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 2002
Where ML
Authors Justin A. Boyan
Comments (0)