Reinforcement learning deals with learning optimal or near optimal policies while interacting with the environment. Application domains with many continuous variables are difficult to solve with existing reinforcement learning methods due to the large search space. In this paper, we use a relational repion to define powerful abstractions that allow us to incorporate domain knowledge and re-use previously learned policies in other similar problems. We also describe how to learn useful actions from human traces using a behavioural cloning approach combined with an exploration phase. Since several conflicting actions may be induced for the same state, reinforcement learning is used to learn an optimal policy over this reduced space. It is shown experimentally how a combination of behavioural cloning and reinforcement learning using a relational representation is powerful enough to learn how to fly an aircraft through different points in space and different turbulence conditions.
Eduardo F. Morales, Claude Sammut