Exploration and apprenticeship learning in reinforcement learning

15 years 4 months ago

Download ai.stanford.edu

We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies" to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an autonomous helicopter, overly aggressive exploration may well result in a crash. In this paper, we consider the apprenticeship learning setting in which a teacher demonstration of the task is available. We show that, given the initial demonstration, no explicit exploration is necessary, and we can attain near-optimal performance (compared to the teacher) simply by repeatedly executing "exploitation policies" that try to maximize rewards. In finite-state MDPs, our algorithm scales polynomially in the number of states; in continuous-state linear dynamical systems, it scales polynomially in the dimension of the state. These results ar...

Pieter Abbeel, Andrew Y. Ng

Real-time Traffic

Explicit Exploration | Exploration Policies | ICML 2005 | Machine Learning | Near-optimal Policies |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2005
Where	ICML
Authors	Pieter Abbeel, Andrew Y. Ng

Comments (0)

Sciweavers

Exploration and apprenticeship learning in reinforcement learning

Explicit Exploration | Exploration Policies | ICML 2005 | Machine Learning | Near-optimal Policies |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers