SarsaLandmark: an algorithm for learning in POMDPs with landmarks

14 years 7 months ago

Download www.aamas-conference.org

Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in partially observable Markov decision processes (POMDPs). Nevertheless, one can construct counterexamples, problems in which Sarsa(λ < 1 ) fails to ﬁnd a good policy even though one exists. Despite this, these algorithms remain of great interest because alternative approaches to learning in POMDPs based on approximating belief-states do not scale. In this paper we present SarsaLandmark, an algorithm for learning in POMDPs with ”landmark” states (most man-made and many natural environments have landmarks). SarsaLandmark simultaneously preserves the advantages oﬀered by eligibility traces and ﬁxes the cause of the failure of Sarsa(λ) on the motivating counterexamples. We present a theoretical analysis of SarsaLandmark for the policy evaluation problem and present empirical results on a few learning ...

Michael R. James, Satinder P. Singh

Real-time Traffic

Artificial Intelligence | ATAL 2009 | Eligibility Traces | Reinforcement Learning | Reinforcement Learning Algorithms |

claim paper

Post Info
More Details (n/a)

Added	26 May 2010
Updated	26 May 2010
Type	Conference
Year	2009
Where	ATAL
Authors	Michael R. James, Satinder P. Singh

Comments (0)

Sciweavers

SarsaLandmark: an algorithm for learning in POMDPs with landmarks

Artificial Intelligence | ATAL 2009 | Eligibility Traces | Reinforcement Learning | Reinforcement Learning Algorithms |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers