Sciweavers

CDC
2009
IEEE

Q-learning and Pontryagin's Minimum Principle

14 years 5 months ago
Q-learning and Pontryagin's Minimum Principle
Abstract— Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It has proven to be effective for models with finite state and action space. This paper establishes connections between Q-learning and nonlinear control of continuous-time models with general state space and general action space. The main contributions are summarized as follows. (i) The starting point is the observation that the “Q-function” appearing in Q-learning algorithms is an extension of the Hamiltonian that appears in the Minimum Principle. Based on this observation we introduce the steepest descent Qlearning (SDQ-learning) algorithm to obtain the optimal approximation of the Hamiltonian within a prescribed finitedimensional function class. (ii) A transformation of the optimality equations is performed based on the adjoint of a resolvent operator. This is used to construct a consistent algorithm ba...
Prashant G. Mehta, Sean P. Meyn
Added 21 Jul 2010
Updated 21 Jul 2010
Type Conference
Year 2009
Where CDC
Authors Prashant G. Mehta, Sean P. Meyn
Comments (0)