The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping

15 years 7 months ago

Download www.hpl.hp.com

Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a formal structure with which to interpret the improvement afforded by shaping rewards. Central to our model is the idea of a reward horizon, which focuses exploration on an MDP's critical region, a subset of states with the property that any policy that performs well on the critical region also performs well on the MDP. We provide a simple algorithm and prove that its learning time is polynomial in the size of the critical region and, crucially, independent of the size of the MDP. This identifies low reward horizons with easy-to-learn MDPs. Shaping rewards, which encode our prior knowledge about the relative merits of decisions, can be seen as artificially reducing the MDP's natural reward horizon. We demonstrate empirically the effects of using shaping to reduce the reward horizon.

Adam Laud, Gerald DeJong

Real-time Traffic

Critical Region | ICML 2003 | Machine Learning | MDP's Critical Region | Reward Horizon |

claim paper

Added	05 Jul 2010
Updated	05 Jul 2010
Type	Conference
Year	2003
Where	ICML
Authors	Adam Laud, Gerald DeJong

Sciweavers

The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping

Critical Region | ICML 2003 | Machine Learning | MDP's Critical Region | Reward Horizon |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers