Multiagent reinforcement learning problems are especially difficult because of their dynamism and the size of joint state space. In this paper a new benchmark problem is proposed, ...
Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...
We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intra...
Pedro Alejandro Ortega, Daniel Alexander Braun, Si...
We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknow...
Aaron Wilson, Alan Fern, Soumya Ray, Prasad Tadepa...
—Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There a...
Hiroaki Arie, Tetsuya Ogata, Jun Tani, Shigeki Sug...