Sciweavers

ICML
2007
IEEE

Automatic shaping and decomposition of reward functions

15 years 10 days ago
Automatic shaping and decomposition of reward functions
This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begin by describing a method that learns a shaped reward function given a set of state and temporal abstractions. Next, we consider decomposition of the per-timestep reward in multieffector problems, in which the overall agent can be decomposed into multiple units that are concurrently carrying out various tasks. We show by example that to find a good reward decomposition, it is often necessary to first shape the rewards appropriately. We then give a function approximation algorithm for solving both problems together. Standard reinforcement learning algorithms can be augmented with our methods, and we show experimentally that in each case, significantly faster learning results.
Bhaskara Marthi
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2007
Where ICML
Authors Bhaskara Marthi
Comments (0)