Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

131

NIPS
2008

favoriteEmaildiscussreport

110views Information Technology» more NIPS 2008»

Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms

15 years 3 months ago

Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms

Download groups.csail.mit.edu

Policy gradient (PG) reinforcement learning algorithms have strong (local) convergence guarantees, but their learning performance is typically limited by a large variance in the estimate of the gradient. In this paper, we formulate the variance reduction problem by describing a signal-to-noise ratio (SNR) for policy gradient algorithms, and evaluate this SNR carefully for the popular Weight Perturbation (WP) algorithm. We confirm that SNR is a good predictor of long-term learning performance, and that in our episodic formulation, the cost-to-go function is indeed the optimal baseline. We then propose two modifications to traditional model-free policy gradient algorithms in order to optimize the SNR. First, we examine WP using anisotropic sampling distributions, which introduces a bias into the update but increases the SNR; this bias can be interpreted as following the natural gradient of the cost function. Second, we show that non-Gaussian distributions can also increase the SNR, and ...

John W. Roberts, Russ Tedrake

Real-time Traffic

Information Technology | Learning Performance | NIPS 2008 | Policy Gradient | Policy Gradient Algorithms |

claim paper

Related Content

» Covariant Policy Search

» Competitive analysis of a dispatch policy for a dynamic multiperiod routing problem

» Reinforcement Learning in Fine Time Discretization

» Image Registration by a Regularized Gradient Flow A Streaming Implementation in DX9 Graphi...

» A worstcase comparison between temporal difference and residual gradient with linear funct...

» Packetmode policies for inputqueued switches

» Uncertainty handling CMAES for reinforcement learning

» Modeling correlations in web traces and implications for designing replacement policies

» A comparative analysis of server selection in content replication networks

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2008
Where	NIPS
Authors	John W. Roberts, Russ Tedrake

Comments (0)