Generating Hierarchical Structure in Reinforcement Learning from State Variables

15 years 6 months ago

Download www.csee.umbc.edu

This paper presents the CQ algorithm which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables. The CQ algorithm uses a heuristic which is applicable for problems that can be modelled by a set of state variables that conform to a special ordering, defined in this paper as a "nested Markov ordering". The benefits of this approach are: (1) the automatic generation of actions and termination conditions at all levels in the hierarchy, and (2) linear scaling with the number of variables under certain conditions. This approach draws heavily on Dietterich's MAXQ value function decomposition and Hauskrecht, Meuleau, Kaelbling, Dean, Boutilier's and others region based decomposition of MDPs. The CQ algorithm is described and its functionality illustrated using a four room example. Different solutions are generated with different numbers of hierarchical levels to solve Dietterich's taxi tasks...

Bernhard Hengst

Real-time Traffic