Potential-based shaping was designed as a way of introducing background knowledge into model-free reinforcement-learning algorithms. By identifying states that are likely to have high value, this approach can decrease experience complexity--the number of trials needed to find near-optimal behavior. An orthogonal way of decreasing experience complexity is to use a model-based learning approach, building and exploiting an explicit transition model. In this paper, we show how potential-based shaping can be redefined to work in the model-based setting to produce an algorithm that shares the benefits of both ideas.
John Asmuth, Michael L. Littman, Robert Zinkov