Sciweavers

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path