In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there i...
David L. Roberts, Sooraj Bhat, Kenneth St. Clair, ...
In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermedi...
Markov decisionprocesses(MDPs) haveproven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, stat...
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizo...
Planning under uncertainty involves two distinct sources of uncertainty: uncertainty about the effects of actions and uncertainty about the current state of the world. The most wi...