In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asympt...
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications...
Some of the most successful recent applications of reinforcement learning have used neural networks and the TD algorithm to learn evaluation functions. In this paper, we examine t...
With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical tec...