Procedural representations of control policies have two advantages when facing the scale-up problem in learning tasks. First they are implicit, with potential for inductive generalization over a very large set of situations. Second they facilitate modularization. In this paper we compare several randomized algorithms for learning modular procedural representations. The main algorithm, called Adaptive Representation through Learning (ARL) is a genetic programming extension that relies on the discovery of subroutines. ARL is suitable for learning hierarchies of subroutines and for constructing policies to complex tasks. ARL was successfully tested on a typical reinforcement learning problem of controlling an agent in a dynamic and nondeterministic environment where the discovered subroutines correspond to agent behaviors.
Justinian P. Rosca, Dana H. Ballard