This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari...
We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecr...
Andreas Argyriou, Charles A. Micchelli, Massimilia...
The max-sum classifier predicts n-tuple of labels from n-tuple of observable variables by maximizing a sum of quality functions defined over neighbouring pairs of labels and obser...
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications...
This paper summarizes research on a new emerging framework for learning to plan using the Markov decision process model (MDP). In this paradigm, two approaches to learning to plan...
Sridhar Mahadevan, Sarah Osentoski, Jeffrey Johns,...