Multimodal Parameter-exploring Policy Gradients

15 years 5 months ago

Download www6.in.tum.de

Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estimates encountered in normal policy gradient methods. It has been shown to drastically speed up convergence for several large-scale reinforcement learning tasks. However the independent normal distributions used by PGPE to search through parameter space are inadequate for some problems with multimodal reward surfaces. This paper extends the basic PGPE algorithm to use multimodal mixture distributions for each parameter, while remaining efficient. Experimental results on the Rastrigin function and the inverted pendulum benchmark demonstrate the advantages of this modification, with faster convergence to better optima.

Frank Sehnke, Alex Graves, Christian Osendorfer, J

Real-time Traffic

Gradient | ICMLA 2010 | Independent Normal Distributions | Machine Learning | Multimodal Mixture Distributions |

claim paper

Added	12 Feb 2011
Updated	12 Feb 2011
Type	Journal
Year	2010
Where	ICMLA
Authors	Frank Sehnke, Alex Graves, Christian Osendorfer, Jürgen Schmidhuber

Sciweavers

Multimodal Parameter-exploring Policy Gradients

Gradient | ICMLA 2010 | Independent Normal Distributions | Machine Learning | Multimodal Mixture Distributions |

Explore & Download

Productivity Tools

Sciweavers