Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of...
Inspired by a broader perspective viewing intelligent system dynamics in terms of the geometry of “cognitive spaces,” we conduct a preliminary investigation of the application ...
We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model ...
We develop a protocol for optimizing dynamic behavior of a network of simple electronic components, such as a sensor network, an ad hoc network of mobile devices, or a network of ...