Least squares (LS) fitting is one of the most fundamental techniques in science and engineering. It is used to estimate parameters from multiple noisy observations. In many probl...
—We investigate the problem of minimizing the overall transmission delay of data packets in a single-user wireless communication system, where the transmitter has a fixed amount...
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...
We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradie...
Relational Markov Decision Processes (MDP) are a useraction for stochastic planning problems since one can develop abstract solutions for them that are independent of domain size ...