Sciweavers

CDC
2010
IEEE
139views Control Systems» more  CDC 2010»
13 years 5 months ago
Q-learning and enhanced policy iteration in discounted dynamic programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-facto...
Dimitri P. Bertsekas, Huizhen Yu