Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

146

COLT
2000
Springer

87views Machine Learning» more COLT 2000»

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

15 years 11 months ago

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

Download www.cs.iastate.edu

We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process ( ¢¡¤£¦¥§ ), and focus on gradient ascent approaches to this problem. In [3] we introduced ¨ ¢¡¤£¦¥§ , an algorithm for estimating the performance gradient of a ©¡¤£¦¥¤ from a single sample path, and we proved that this algorithm almost surely converges to an approximation to the gradient. In this paper, we provide a convergence rate for the estimates produced by ¨ ¢¡¤£¦¥§ , and give an improved bound on the approximation error of these estimates. Both of these bounds are in terms of mixing times of the ©¡¤£¦¥¤ .

Peter L. Bartlett, Jonathan Baxter

Real-time Traffic

COLT 2000 | Gradient Ascent Approaches | Machine Learning | Partially Observable Markov Decision Process | Single Sample Path |

claim paper

Related Content

» GradientBased Learning Updates Improve XCS Performance in Multistep Problems

» The Optimal Reward Baseline for GradientBased Reinforcement Learning

» Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

» Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bi...

» PACBayesian Policy Evaluation for Reinforcement Learning

» Approximation and Estimation Bounds for Artificial Neural Networks

» CBR for State Value Function Approximation in Reinforcement Learning

» SampleEfficient Evolutionary Function Approximation for Reinforcement Learning

» Nonparametric Return Distribution Approximation for Reinforcement Learning

Post Info
More Details (n/a)

Added	02 Aug 2010
Updated	02 Aug 2010
Type	Conference
Year	2000
Where	COLT
Authors	Peter L. Bartlett, Jonathan Baxter

Comments (0)