Q-learning and Pontryagin's Minimum Principle

14 years 5 months ago

Download www.stanford.edu

Abstract— Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It has proven to be effective for models with ﬁnite state and action space. This paper establishes connections between Q-learning and nonlinear control of continuous-time models with general state space and general action space. The main contributions are summarized as follows. (i) The starting point is the observation that the “Q-function” appearing in Q-learning algorithms is an extension of the Hamiltonian that appears in the Minimum Principle. Based on this observation we introduce the steepest descent Qlearning (SDQ-learning) algorithm to obtain the optimal approximation of the Hamiltonian within a prescribed ﬁnitedimensional function class. (ii) A transformation of the optimality equations is performed based on the adjoint of a resolvent operator. This is used to construct a consistent algorithm ba...

Prashant G. Mehta, Sean P. Meyn

Real-time Traffic

Action Space | CDC 2009 | Control Systems | Controlled Markov Chain | General Action Space |

claim paper

Post Info
More Details (n/a)

Added	21 Jul 2010
Updated	21 Jul 2010
Type	Conference
Year	2009
Where	CDC
Authors	Prashant G. Mehta, Sean P. Meyn

Comments (0)

Sciweavers

Q-learning and Pontryagin's Minimum Principle

Action Space | CDC 2009 | Control Systems | Controlled Markov Chain | General Action Space |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers