Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
Abstract. We examine the fair allocation of capacity to a large population of best-effort connections in a typical multiple access communication system supporting some bandwidth on...
In real-time animation systems, motion interpolation techniques are widely used for their controllability and efficiency. The techniques sample the parameter space using example mo...
Dynamic Programming, Q-learning and other discrete Markov Decision Process solvers can be applied to continuous d-dimensional state-spaces by quantizing the state space into an arr...
that the equivalent channel is approximately an impulse. In [7], Martin et al. propose a globally convergent blind adap-In this paper, we propose a frequency domain based de- tive ...