We give an optimal dynamic programming algorithm to solve a class of finite-horizon decentralized Markov decision processes (MDPs). We consider problems with a broadcast informati...
In this paper, we present a new algorithm that integrates recent advances in solving continuous bandit problems with sample-based rollout methods for planning in Markov Decision P...
Christopher R. Mansley, Ari Weinstein, Michael L. ...
We present a technique for computing approximately optimal solutions to stochastic resource allocation problems modeled as Markov decision processes (MDPs). We exploit two key pro...
Nicolas Meuleau, Milos Hauskrecht, Kee-Eung Kim, L...
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. The payoff received by the controller can be evaluated in different ways, dep...