We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision ...
We give a new practical algorithm to compute, in finite time, a fixpoint (and often the least fixpoint) of a system of equations in the abstract numerical domains of zones and t...
Stephane Gaubert, Eric Goubault, Ankur Taly, Sarah...
We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values,...
Abstract— Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers...
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. ...