Sciweavers

JMLR
2002
100views more  JMLR 2002»
13 years 11 months ago
On the Convergence of Optimistic Policy Iteration
We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values,...
John N. Tsitsiklis
ICML
2009
IEEE
15 years 7 days ago
Model-free reinforcement learning as mixture learning
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizo...
Nikos Vlassis, Marc Toussaint