Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (appro...
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
Abstract— In this paper, we propose a way of achieving optimality in radio resource management (RRM) for heterogeneous networks. We consider a micro or femto cell with two co-loc...
Marceau Coupechoux, Jean Marc Kelif, Philippe Godl...
Optimal resource scheduling in multiagent systems is a computationally challenging task, particularly when the values of resources are not additive. We consider the combinatorial ...
Dmitri A. Dolgov, Michael R. James, Michael E. Sam...