Sciweavers

85 search results - page 6 / 17
» Approximate Policy Iteration with a Policy Language Bias
Sort
View
DEDS
2010
97views more  DEDS 2010»
13 years 7 months ago
On Regression-Based Stopping Times
We study approaches that fit a linear combination of basis functions to the continuation value function of an optimal stopping problem and then employ a greedy policy based on the...
Benjamin Van Roy
PKDD
2010
Springer
164views Data Mining» more  PKDD 2010»
13 years 5 months ago
Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations
Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightf...
Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mik...
PKDD
2009
Springer
169views Data Mining» more  PKDD 2009»
14 years 1 months ago
Hybrid Least-Squares Algorithms for Approximate Policy Evaluation
The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Temporal difference methods and Bellman residual m...
Jeffrey Johns, Marek Petrik, Sridhar Mahadevan
NIPS
2004
13 years 8 months ago
VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability:...
Pascal Poupart, Craig Boutilier
ICML
2009
IEEE
14 years 8 months ago
Model-free reinforcement learning as mixture learning
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizo...
Nikos Vlassis, Marc Toussaint