Sciweavers

NIPS
2001

The Steering Approach for Multi-Criteria Reinforcement Learning

14 years 24 days ago
The Steering Approach for Multi-Criteria Reinforcement Learning
We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.
Shie Mannor, Nahum Shimkin
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where NIPS
Authors Shie Mannor, Nahum Shimkin
Comments (0)