Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large ...
— An important problem in robotics is planning and selecting actions for goal-directed behavior in noisy uncertain environments. The problem is typically addressed within the fra...
We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (Dec-POMDP). Unfortunately...
Matthijs T. J. Spaan, Frans A. Oliehoek, Nikos A. ...
This paper models and analyzes serial production lines with specialists at each station and a single, cross-trained floating worker who can work at any station. We formulate Marko...
Linn I. Sennott, Mark P. Van Oyen, Seyed M. R. Ira...