Sciweavers

CORR
2006
Springer

A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

13 years 11 months ago
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD(), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration. 1 The policy evaluation problem A Markov Decision Process (MDP) describes a dynamical system and an agent. The system is described by its state s S. When considering discrete time, the age...
Manuel Loth, Philippe Preux
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2006
Where CORR
Authors Manuel Loth, Philippe Preux
Comments (0)