Sciweavers

CDC
2009
IEEE
132views Control Systems» more  CDC 2009»
14 years 5 months ago
Q-learning and Pontryagin's Minimum Principle
Abstract— Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It ...
Prashant G. Mehta, Sean P. Meyn