Sciweavers

102 search results - page 11 / 21
» MDPs with Non-Deterministic Policies
Sort
View
AAAI
2007
13 years 9 months ago
Authorial Idioms for Target Distributions in TTD-MDPs
In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there i...
David L. Roberts, Sooraj Bhat, Kenneth St. Clair, ...
AAAI
2007
13 years 9 months ago
Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games
In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermedi...
Colin McMillen, Manuela M. Veloso
AI
2000
Springer
13 years 7 months ago
Stochastic dynamic programming with factored representations
Markov decisionprocesses(MDPs) haveproven to be popular models for decision-theoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, stat...
Craig Boutilier, Richard Dearden, Moisés Go...
ICML
2009
IEEE
14 years 8 months ago
Model-free reinforcement learning as mixture learning
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizo...
Nikos Vlassis, Marc Toussaint
AAAI
2007
13 years 9 months ago
Purely Epistemic Markov Decision Processes
Planning under uncertainty involves two distinct sources of uncertainty: uncertainty about the effects of actions and uncertainty about the current state of the world. The most wi...
Régis Sabbadin, Jérôme Lang, N...