Dynamic preferences in multi-criteria reinforcement learning

15 years 1 months ago

Download www.machinelearning.org

The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent's preferences between different objectives may vary with time. In this paper, we consider the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance. We propose a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that although there are infinitely many weight vectors, they may be well-covered by a small number of optimal policies. We show this empirically in two domains: a version of the Buridan's ass problem and network routing.

Sriraam Natarajan, Prasad Tadepalli

Real-time Traffic

ICML 2005 | Machine Learning | Multiple Objectives | Real World Situations | Time-varying Preferences |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2005
Where	ICML
Authors	Sriraam Natarajan, Prasad Tadepalli

Comments (0)

Sciweavers

Dynamic preferences in multi-criteria reinforcement learning

ICML 2005 | Machine Learning | Multiple Objectives | Real World Situations | Time-varying Preferences |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers