Sciweavers

AAAI
2011

Differential Eligibility Vectors for Advantage Updating and Gradient Methods

13 years 13 days ago
Differential Eligibility Vectors for Advantage Updating and Gradient Methods
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Specifically, we use DEV in TD-Q(λ) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergence w.p.1 of TD-Q(λ) with DEV and show that this algorithm can also be used to directly approximate the advantage function associated with a given policy, without the need to compute an auxiliary function – something that, to the extent of our knowledge, was not known possible. Finally, we discuss the integration of DEV in LSTDQ and actor-critic algorithms.
Francisco S. Melo
Added 12 Dec 2011
Updated 12 Dec 2011
Type Journal
Year 2011
Where AAAI
Authors Francisco S. Melo
Comments (0)