In multiple criteria Markov Decision Processes (MDP) where multiple costs are incurred at every decision point, current methods solve them by minimising the expected primary cost ...
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies...
We present new algorithms for inverse optimal control (or inverse reinforcement learning, IRL) within the framework of linearlysolvable MDPs (LMDPs). Unlike most prior IRL algorit...
We examine the problem of Transfer in Reinforcement Learning and present a method to utilize knowledge acquired in one Markov Decision Process (MDP) to bootstrap learning in a mor...