We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values i...
In this paper we propose a financial trading system whose strategy is developed by means of an artificial neural network approach based on a recurrent reinforcement learning algo...
We consider a push pull queueing system with two servers and two types of jobs which are processed by the two servers in opposite order, with stochastic generally distributed proc...
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...
—In this paper, we derive a time-complexity bound for the gradient projection method for optimal routing in data networks. This result shows that the gradient projection algorith...