— We introduce the Oracular Partially Observable Markov Decision Process (OPOMDP), a type of POMDP in which the world produces no observations; instead there is an “oracle,” ...
A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much ...
Recent research in decision theoretic planning has focussedon making the solution of Markov decision processes (MDPs) more feasible. We develop a family of algorithms for structur...
Craig Boutilier, Ronen I. Brafman, Christopher W. ...
Markov Decision Processes (MDP) have been widely used as a framework for planning under uncertainty. They allow to compute optimal sequences of actions in order to achieve a given...
— We consider decision making in a Markovian setup where the reward parameters are not known in advance. Our performance criterion is the gap between the performance of the best ...