Sciweavers

ICML
2010
IEEE

Nonparametric Return Distribution Approximation for Reinforcement Learning

14 years 16 days ago
Nonparametric Return Distribution Approximation for Reinforcement Learning
Standard Reinforcement Learning (RL) aims to optimize decision-making rules in terms of the expected return. However, especially for risk-management purposes, other criteria such as the expected shortfall are sometimes preferred. Here, we describe a method of approximating the distribution of returns, which allows us to derive various kinds of information about the returns. We first show that the Bellman equation, which is a recursive formula for the expected return, can be extended to the cumulative return distribution. Then we derive a nonparametric return distribution estimator with particle smoothing based on this extended Bellman equation. A key aspect of the proposed algorithm is to represent the recursion relation in the extended Bellman equation by a simple replacement procedure of particles associated with a state by using those of the successor state. We show that our algorithm leads to a risksensitive RL paradigm. The usefulness of the proposed approach is demonstrated thro...
Tetsuro Morimura, Masashi Sugiyama, Hisashi Kashim
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where ICML
Authors Tetsuro Morimura, Masashi Sugiyama, Hisashi Kashima, Hirotaka Hachiya, Toshiyuki Tanaka
Comments (0)