Sciweavers

ECML
2007
Springer

Safe Q-Learning on Complete History Spaces

14 years 7 months ago
Safe Q-Learning on Complete History Spaces
In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies comparable to the optimal solution on belief states. The algorithm presented is model-free and can be combined with any method learning history spaces. We also present a procedure able to learn history spaces especially suited for our Q-learning algorithm.
Stephan Timmer, Martin Riedmiller
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where ECML
Authors Stephan Timmer, Martin Riedmiller
Comments (0)