Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...
Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in pa...
We define the class of SMART scheduling policies. These are policies that bias towards jobs with small remaining service times, jobs with small original sizes, or both, with the ...
Adam Wierman, Mor Harchol-Balter, Takayuki Osogami
Abstract— We propose a planning algorithm that allows usersupplied domain knowledge to be exploited in the synthesis of information feedback policies for systems modeled as parti...
Salvatore Candido, James C. Davidson, Seth Hutchin...
Abstract— This paper proposes a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes. The algorithm is tested i...
Ruben Martinez-Cantin, Nando de Freitas, Arnaud Do...