Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...
Frank Sehnke, Alex Graves, Christian Osendorfer, J...
Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...
The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have f...
— This paper addresses learning based adaptive resource allocation for wireless MIMO channels with Markovian fading. The problem is posed as Constrained Markov Decision Process w...
Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightf...
Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mik...