Sciweavers

102 search results - page 13 / 21
» MDPs with Non-Deterministic Policies
Sort
View
NIPS
1998
13 years 8 months ago
Gradient Descent for General Reinforcement Learning
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number ...
Leemon C. Baird III, Andrew W. Moore
ML
2002
ACM
143views Machine Learning» more  ML 2002»
13 years 7 months ago
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
An issue that is critical for the application of Markov decision processes MDPs to realistic problems is how the complexity of planning scales with the size of the MDP. In stochas...
Michael J. Kearns, Yishay Mansour, Andrew Y. Ng
ALT
2008
Springer
14 years 4 months ago
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
Ronald Ortner
ICTAI
2006
IEEE
14 years 1 months ago
A New Hybrid GA-MDP Algorithm For The Frequency Assignment Problem
We propose a novel algorithm called GA-MDP for solving the frequency assigment problem. GA-MDP inherits the spirit of genetic algorithms with an adaptation of Markov Decision Proc...
Lhassane Idoumghar, René Schott
AAAI
1997
13 years 8 months ago
Model Minimization in Markov Decision Processes
Many stochastic planning problems can be represented using Markov Decision Processes (MDPs). A difficulty with using these MDP representations is that the common algorithms for so...
Thomas Dean, Robert Givan