We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available afte...
Recent advances in technology allow multi-agent systems to be deployed in cooperation with or as a service for humans. Typically, those systems are designed assuming individually ...
T ORDER REGRESSION (EXTENDED ABSTRACT) Kurt Driessensa Saso Dzeroskib a Department of Computer Science, University of Waikato, Hamilton, New Zealand (kurtd@waikato.ac.nz) b Departm...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
The control of high-dimensional, continuous, non-linear dynamical systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques...