Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

14 years 6 months ago

Download www.cs.iastate.edu

We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process ( ¢¡¤£¦¥§ ), and focus on gradient ascent approaches to this problem. In [3] we introduced ¨ ¢¡¤£¦¥§ , an algorithm for estimating the performance gradient of a ©¡¤£¦¥¤ from a single sample path, and we proved that this algorithm almost surely converges to an approximation to the gradient. In this paper, we provide a convergence rate for the estimates produced by ¨ ¢¡¤£¦¥§ , and give an improved bound on the approximation error of these estimates. Both of these bounds are in terms of mixing times of the ©¡¤£¦¥¤ .

Peter L. Bartlett, Jonathan Baxter

Real-time Traffic

COLT 2000 | Gradient Ascent Approaches | Machine Learning | Partially Observable Markov Decision Process | Single Sample Path |

claim paper

Post Info
More Details (n/a)

Added	02 Aug 2010
Updated	02 Aug 2010
Type	Conference
Year	2000
Where	COLT
Authors	Peter L. Bartlett, Jonathan Baxter

Comments (0)

Sciweavers

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

COLT 2000 | Gradient Ascent Approaches | Machine Learning | Partially Observable Markov Decision Process | Single Sample Path |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers