Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

178

Voted

JMLR
2002

100views more JMLR 2002»

On the Convergence of Optimistic Policy Iteration

15 years 6 months ago

On the Convergence of Optimistic Policy Iteration

Download www.mit.edu

We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values, in conjunction with greedy policy selection. We provide convergence results for a number of algorithmic variations, including one that involves temporal difference learning (bootstrapping) instead of Monte Carlo estimation. We also indicate some extensions that either fail or are unlikely to go through.

John N. Tsitsiklis

Real-time Traffic

Carlo Estimation | Finite-state Markov Decision | JMLR 2002 | Optimistic Policy Iteration |

claim paper

Related Content

» Modelfree reinforcement learning as mixture learning

» PointBased Policy Iteration

» The Convergence of Iterated Classification

» Qlearning and enhanced policy iteration in discounted dynamic programming

» Regularized Policy Iteration

» Multiagent Learning Dynamics A Survey

» Exploiting locality of interactions using a policygradient approach in multiagent learning

» MultiAgent Learning with Policy Prediction

» FiniteSample Convergence Rates for QLearning and Indirect Algorithms

Post Info
More Details (n/a)

Added	22 Dec 2010
Updated	22 Dec 2010
Type	Journal
Year	2002
Where	JMLR
Authors	John N. Tsitsiklis

Comments (0)