Sciweavers

COLT
2000
Springer

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

14 years 4 months ago
Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning
We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (  ¢¡¤£¦¥§  ), and focus on gradient ascent approaches to this problem. In [3] we introduced ¨  ¢¡¤£¦¥§  , an algorithm for estimating the performance gradient of a  ©¡¤£¦¥¤  from a single sample path, and we proved that this algorithm almost surely converges to an approximation to the gradient. In this paper, we provide a convergence rate for the estimates produced by ¨  ¢¡¤£¦¥§  , and give an improved bound on the approximation error of these estimates. Both of these bounds are in terms of mixing times of the  ©¡¤£¦¥¤  .
Peter L. Bartlett, Jonathan Baxter
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 2000
Where COLT
Authors Peter L. Bartlett, Jonathan Baxter
Comments (0)