Most formulations of Reinforcement Learning depend on a single reinforcement reward value to guide the search for the optimal policy solution. If observation of this reward is rar...
For global motion estimation in model-based coding, this paper proposes a constrained spatio-temporal gradient method using contour information. To overcome the local minimum prob...
Young Wook Sohn, Doo-Hyun Kim, Dong-O Kim, Rae-Hon...
Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...
Frank Sehnke, Alex Graves, Christian Osendorfer, J...
This paper considers the least-square online gradient descent algorithm in a reproducing kernel Hilbert space (RKHS) without explicit regularization. We present a novel capacity i...
— The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-struc...