Postponed Updates for Temporal-Difference Reinforcement Learning

16 years 1 months ago

Download www.science.uva.nl

This paper presents postponed updates, a new strategy for TD methods that can improve sample efﬁciency without incurring the computational and space requirements of model-based RL. By recording the agent’s last-visit experience, the agent can delay its update until the given state is revisited, thereby improving the quality of the update. Experimental results demonstrate that postponed updates outperforms several competitors, most notably eligibility traces, a traditional way to improve the sample efﬁciency of TD methods. It achieves this without the need to tune an extra parameter as is needed for eligibility traces.

Harm van Seijen, Shimon Whiteson

Real-time Traffic

Eligibility Traces | ISDA 2009 | Operating Systems | Sample Efﬁciency | TD Methods |

claim paper

» Incremental Natural ActorCritic Algorithms

» A Convergent Online Single Time Scale Actor Critic Algorithm

» Collaborative Multiagent Reinforcement Learning by Payoff Propagation

» Least Squares SVM for Least Squares TD Learning

Post Info
More Details (n/a)

Added	24 May 2010
Updated	24 May 2010
Type	Conference
Year	2009
Where	ISDA
Authors	Harm van Seijen, Shimon Whiteson

Comments (0)

Sciweavers

Postponed Updates for Temporal-Difference Reinforcement Learning

Eligibility Traces | ISDA 2009 | Operating Systems | Sample Efﬁciency | TD Methods |

Explore & Download

Productivity Tools

Sciweavers