Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares ...
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of...
Policy-based networks can be customized by users by injecting programs called policies into the network nodes. So if general-purpose functions can be specified in a policy-based ne...
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive ...
The requirement for 24/7 availability of distributed applications complicates their maintenance and evolution as shutting down such applications to perform updates may not be an a...
Robert Pawel Bialek, Eric Jul, Jean-Guy Schneider,...