Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...
This paper presents a methodology for protecting low-priority best-effort (BE) traffic in a network domain that provides both virtual-circuit routing with bandwidth reservation for...
Abstract—We provide a formal model for the Change Management process for Enterprise IT systems, and develop change scheduling algorithms that seek to attain the “change capacit...
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optim...