We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications...
In this paper we present an approach for reducing the memory footprint requirement of temporal difference methods in which the set of states is finite. We use case-based generaliza...
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...
The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation e...
Nicol N. Schraudolph, Peter Dayan, Terrence J. Sej...