Using inaccurate models in reinforcement learning

15 years 1 months ago

Download ai.stanford.edu

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for highdimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that--when given only a crude model and a small number of rea...

Pieter Abbeel, Morgan Quigley, Andrew Y. Ng

Real-time Traffic

Approximate Model | ICML 2006 | Machine Learning | Near-optimal Performance | Real-life Trials--our Algorithm |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2006
Where	ICML
Authors	Pieter Abbeel, Morgan Quigley, Andrew Y. Ng

Comments (0)

Sciweavers

Using inaccurate models in reinforcement learning

Approximate Model | ICML 2006 | Machine Learning | Near-optimal Performance | Real-life Trials--our Algorithm |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers