We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (L...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...
Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of ...