Search Sciweavers | Sciweavers

85 search results - page 6 / 17

» Approximate Policy Iteration with a Policy Language Bias

click to vote

DEDS
2010

97views more DEDS 2010»

On Regression-Based Stopping Times

13 years 7 months ago

Download www.stanford.edu

We study approaches that fit a linear combination of basis functions to the continuation value function of an optimal stopping problem and then employ a greedy policy based on the...

Benjamin Van Roy

claim paper

Read More »

click to vote

PKDD
2010
Springer

164views Data Mining» more PKDD 2010»

Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations

13 years 5 months ago

Download users.ics.tkk.fi

Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightf...

Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mik...

claim paper

Read More »

click to vote

PKDD
2009
Springer

169views Data Mining» more PKDD 2009»

Hybrid Least-Squares Algorithms for Approximate Policy Evaluation

14 years 1 months ago

Download www.cs.umass.edu

The goal of approximate policy evaluation is to “best” represent a target value function according to a speciﬁc criterion. Temporal difference methods and Bellman residual m...

Jeffrey Johns, Marek Petrik, Sridhar Mahadevan

claim paper

Read More »

click to vote

NIPS
2004

125views Information Technology» more NIPS 2004»

VDCBPI: an Approximate Scalable Algorithm for Large POMDPs

13 years 8 months ago

Download books.nips.cc

Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability:...

Pascal Poupart, Craig Boutilier

claim paper

Read More »

click to vote

ICML
2009
IEEE

172views Machine Learning» more ICML 2009»

Model-free reinforcement learning as mixture learning

14 years 8 months ago

Download user.cs.tu-berlin.de

We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizo...

Nikos Vlassis, Marc Toussaint

claim paper

Read More »

« Prev « First page 6 / 17 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers