Sciweavers

JMLR
2012

Hierarchical Relative Entropy Policy Search

12 years 1 months ago
Hierarchical Relative Entropy Policy Search
Many real-world problems are inherently hierarchically structured. The use of this structure in an agent’s policy may well be the key to improved scalability and higher performance. However, such hierarchical structures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy — the ‘mixed option’ policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy determines the action. In this paper, we reformulate learning a hierarchical policy as a latent variable estimation problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solutions while also showing an increased performance in terms of learning speed and quality of the found policy in comparison to the nonhierarchical approach.
Christian Daniel, Gerhard Neumann, Jan Peters
Added 27 Sep 2012
Updated 27 Sep 2012
Type Journal
Year 2012
Where JMLR
Authors Christian Daniel, Gerhard Neumann, Jan Peters
Comments (0)