Convergence of Stochastic Iterative Dynamic Programming Algorithms

14 years 5 months ago

Download www.bitsavers.org

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms,including the TD( ) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD( ) and Q-learning belong. Copyright c Massachusetts Institute of Technology, 1993 This report describes research done at the Dept. of Brain and Cognitive Sciences, the Center for Biological and Computational Learning, and the Arti cial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for CBCL is provided in part by a grant from the NSF...

Tommi Jaakkola, Michael I. Jordan, Satinder P. Sin

Real-time Traffic

Algorithms | Arti Cial Intelligence | NIPS 1993 | NIPS 2007 | Nsf Grant Ecs-9216531 |

claim paper

Post Info
More Details (n/a)

Added	02 Nov 2010
Updated	02 Nov 2010
Type	Conference
Year	1993
Where	NIPS
Authors	Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh

Comments (0)

Sciweavers

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Algorithms | Arti Cial Intelligence | NIPS 1993 | NIPS 2007 | Nsf Grant Ecs-9216531 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers