A new training algorithm is presented for delayed reinforcement learning problems that does not assume the existence of a critic model and employs the polytope optimization algorit...
We propose a modular reinforcement learning architecture for non-linear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic i...
A gradient-based method for both symmetric and asymmetric multiagent reinforcement learning is introduced in this paper. Symmetric multiagent reinforcement learning addresses the ...