We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Abstract Partially observable Markov decision processes (POMDPs) are a principled mathematical framework for planning under uncertainty, a crucial capability for reliable operation...
Hanna Kurniawati, Yanzhu Du, David Hsu, Wee Sun Le...