Abstract—Linear discriminant analysis (LDA) is a wellknown dimension reduction approach, which projects highdimensional data into a low-dimensional space with the best separation...
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. ...
The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Temporal difference methods and Bellman residual m...
Selection of an optimal estimator typically relies on either supervised training samples (pairs of measurements and their associated true values), or a prior probability model for...
We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (L...