Search Sciweavers | Sciweavers

Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...

Dimitri P. Bertsekas

claim paper

Read More »

143

click to vote

CDC
2010
IEEE

139views Control Systems» more CDC 2010»

Q-learning and enhanced policy iteration in discounted dynamic programming

14 years 11 months ago

Download web.mit.edu

We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-facto...

Dimitri P. Bertsekas, Huizhen Yu

claim paper

Read More »

102

click to vote

EOR
2011

96views more EOR 2011»

Analysis of stochastic dual dynamic programming method

14 years 11 months ago