Direct Policy Search using Paired Statistical Tests

15 years 4 months ago

Download www.autonlab.org

Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng & Jordan, 1999). We evaluate Pegasus, and other paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) Paired tests can improve performance of deterministic and stochastic optimization procedures. (ii) Our proposed alternatives to Pegasus can generalize better, by using a different test statistic, or changing the scenarios during learning. (iii) Adapting the number of trials used for each policy comparison yields fast and robust learning.

Malcolm J. A. Strens, Andrew W. Moore

Real-time Traffic

ICML 2001 | Machine Learning | Policy Comparison Yields | Stochastic Optimization Problem | Stochastic Optimization Procedures |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2001
Where	ICML
Authors	Malcolm J. A. Strens, Andrew W. Moore

Comments (0)

Sciweavers

Direct Policy Search using Paired Statistical Tests

ICML 2001 | Machine Learning | Policy Comparison Yields | Stochastic Optimization Problem | Stochastic Optimization Procedures |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers