This site uses cookies to deliver our services and to ensure you get the best experience. By continuing to use this site, you consent to our use of cookies and acknowledge that you have read and understand our Privacy Policy, Cookie Policy, and Terms
We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values,...
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizo...