We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
Recent advances in flash media have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA's...
: This paper proposes twin prototype support vector machine (TVM), a constant space and sublinear time support vector machine (SVM) algorithm for online learning. TVM achieves its ...
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference meth...
Doina Precup, Richard S. Sutton, Satinder P. Singh
We investigate the problem of finding the metric structure of a general 3D scene viewed by a moving camera with square pixels and constant unknown focal length. While the problem ...