Sciweavers

ICML
2005
IEEE

Exploration and apprenticeship learning in reinforcement learning

15 years 1 months ago
Exploration and apprenticeship learning in reinforcement learning
We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies" to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an autonomous helicopter, overly aggressive exploration may well result in a crash. In this paper, we consider the apprenticeship learning setting in which a teacher demonstration of the task is available. We show that, given the initial demonstration, no explicit exploration is necessary, and we can attain near-optimal performance (compared to the teacher) simply by repeatedly executing "exploitation policies" that try to maximize rewards. In finite-state MDPs, our algorithm scales polynomially in the number of states; in continuous-state linear dynamical systems, it scales polynomially in the dimension of the state. These results ar...
Pieter Abbeel, Andrew Y. Ng
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2005
Where ICML
Authors Pieter Abbeel, Andrew Y. Ng
Comments (0)