The Steering Approach for Multi-Criteria Reinforcement Learning

14 years 1 months ago

Download books.nips.cc

We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.

Shie Mannor, Nahum Shimkin

Real-time Traffic

Learning Algorithm | Long-term Average Reward | NIPS 2001 | NIPS 2007 | Target Set |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	NIPS
Authors	Shie Mannor, Nahum Shimkin

Comments (0)

Sciweavers

The Steering Approach for Multi-Criteria Reinforcement Learning

Learning Algorithm | Long-term Average Reward | NIPS 2001 | NIPS 2007 | Target Set |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers