Sciweavers

178

CDC
2010
IEEE

139views Control Systems» more CDC 2010»

Q-learning and enhanced policy iteration in discounted dynamic programming

15 years 1 months ago

We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-facto...

Dimitri P. Bertsekas, Huizhen Yu

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers