A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

14 years 26 days ago

Download hal.inria.fr

This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD(), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration. 1 The policy evaluation problem A Markov Decision Process (MDP) describes a dynamical system and an agent. The system is described by its state s S. When considering discrete time, the age...

Manuel Loth, Philippe Preux

Real-time Traffic

CORR 2006 | Education | Markov Decision | Policy | Policy Evaluation |

claim paper

Post Info
More Details (n/a)

Added	11 Dec 2010
Updated	11 Dec 2010
Type	Journal
Year	2006
Where	CORR
Authors	Manuel Loth, Philippe Preux

Comments (0)

Sciweavers

A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

CORR 2006 | Education | Markov Decision | Policy | Policy Evaluation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers