Sciweavers

ICML
2003
IEEE

The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping

14 years 4 months ago
The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping
Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a formal structure with which to interpret the improvement afforded by shaping rewards. Central to our model is the idea of a reward horizon, which focuses exploration on an MDP's critical region, a subset of states with the property that any policy that performs well on the critical region also performs well on the MDP. We provide a simple algorithm and prove that its learning time is polynomial in the size of the critical region and, crucially, independent of the size of the MDP. This identifies low reward horizons with easy-to-learn MDPs. Shaping rewards, which encode our prior knowledge about the relative merits of decisions, can be seen as artificially reducing the MDP's natural reward horizon. We demonstrate empirically the effects of using shaping to reduce the reward horizon.
Adam Laud, Gerald DeJong
Added 05 Jul 2010
Updated 05 Jul 2010
Type Conference
Year 2003
Where ICML
Authors Adam Laud, Gerald DeJong
Comments (0)