Explanation-Based Reinforcement Learning (EBRL) was introduced by Dietterich and Flann as a way of combining the ability of Reinforcement Learning (RL) to learn optimal plans with the generalization ability of Explanation-Based Learning (EBL) (Dietterich & Flann, 1995). We extend this work to domains where the agent must order and achieve a sequence of subgoals in an optimal fashion. Hierarchical EBRL can e ectively learn optimal policies in some of these sequential task domains even when the subgoals weakly interact with each other. We also show that when a planner that can achieve the individual subgoals is available, our method converges even faster.
Prasad Tadepalli, Thomas G. Dietterich