Sciweavers

ICML
2005
IEEE

Dynamic preferences in multi-criteria reinforcement learning

15 years 7 days ago
Dynamic preferences in multi-criteria reinforcement learning
The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent's preferences between different objectives may vary with time. In this paper, we consider the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance. We propose a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that although there are infinitely many weight vectors, they may be well-covered by a small number of optimal policies. We show this empirically in two domains: a version of the Buridan's ass problem and network routing.
Sriraam Natarajan, Prasad Tadepalli
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2005
Where ICML
Authors Sriraam Natarajan, Prasad Tadepalli
Comments (0)