In this paper, we present a new algorithm that integrates recent advances in solving continuous bandit problems with sample-based rollout methods for planning in Markov Decision P...
Christopher R. Mansley, Ari Weinstein, Michael L. ...
We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of...
Inference in Markov Decision Processes has recently received interest as a means to infer goals of an observed action, policy recognition, and also as a tool to compute policies. ...
Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (fmdps). However, these algorithms need ...
Thomas Degris, Olivier Sigaud, Pierre-Henri Wuille...
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...