Sciweavers

ATAL
2009
Springer

SarsaLandmark: an algorithm for learning in POMDPs with landmarks

14 years 7 months ago
SarsaLandmark: an algorithm for learning in POMDPs with landmarks
Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in partially observable Markov decision processes (POMDPs). Nevertheless, one can construct counterexamples, problems in which Sarsa(λ < 1 ) fails to find a good policy even though one exists. Despite this, these algorithms remain of great interest because alternative approaches to learning in POMDPs based on approximating belief-states do not scale. In this paper we present SarsaLandmark, an algorithm for learning in POMDPs with ”landmark” states (most man-made and many natural environments have landmarks). SarsaLandmark simultaneously preserves the advantages offered by eligibility traces and fixes the cause of the failure of Sarsa(λ) on the motivating counterexamples. We present a theoretical analysis of SarsaLandmark for the policy evaluation problem and present empirical results on a few learning ...
Michael R. James, Satinder P. Singh
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where ATAL
Authors Michael R. James, Satinder P. Singh
Comments (0)