value function | Sciweavers

61

NIPS
2001

144views Information Technology» more NIPS 2001»

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

14 years 8 months ago

Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...

Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...

claim paper

Read More »

74

click to vote

ESANN
2003

152views Neural Networks» more ESANN 2003»

Improving iterative repair strategies for scheduling with the SVM

14 years 9 months ago

Download www2.in.tu-clausthal.de

The resource constraint project scheduling problem (RCPSP) is an NP-hard benchmark problem in scheduling which takes into account the limitation of resources’ availabilities in ...

Kai Gersmann, Barbara Hammer

claim paper

Read More »

63

click to vote

IJCAI
2001

185views Artificial Intelligence» more IJCAI 2001»

Symbolic Dynamic Programming for First-Order MDPs

14 years 9 months ago

Download www.cs.toronto.edu

We present a dynamic programming approach for the solution of first-order Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the ...

Craig Boutilier, Raymond Reiter, Bob Price

claim paper

Read More »

66

click to vote

ESANN
2004

90views Neural Networks» more ESANN 2004»

High-accuracy value-function approximation with neural networks applied to the acrobot

14 years 9 months ago

Download remi.coulom.free.fr

Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this pape...

Rémi Coulom

claim paper

Read More »

88

click to vote

AAAI
2006

161views Intelligent Agents» more AAAI 2006»

Sample-Efficient Evolutionary Function Approximation for Reinforcement Learning

14 years 9 months ago

Download staff.science.uva.nl

Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent's optimal value function. In most real-world proble...

Shimon Whiteson, Peter Stone

claim paper

Read More »

56

click to vote

NIPS
2007

124views Information Technology» more NIPS 2007»

Random Sampling of States in Dynamic Programming

14 years 9 months ago

Download books.nips.cc

We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using lo...

Christopher G. Atkeson, Benjamin Stephens

claim paper

Read More »

65

click to vote

ECAI
2008
Springer

83views Artificial Intelligence» more ECAI 2008»

Reinforcement Learning with the Use of Costly Features

14 years 9 months ago

Download people.cs.kuleuven.be

In many practical reinforcement learning problems, the state space is too large to permit an exact representation of the value function, much less the time required to compute it. ...

Robby Goetschalckx, Scott Sanner, Kurt Driessens

claim paper

Read More »

75

click to vote

WSC
2007

120views Modeling And Simulation» more WSC 2007»

Path-wise estimators and cross-path regressions: an application to evaluating portfolio strategies

14 years 9 months ago

Download www.informs-sim.org

Recently developed dual techniques allow us to evaluate a given sub-optimal dynamic portfolio policy by using the policy to construct an upper bound on the optimal value function....

Martin B. Haugh, Ashish Jain

claim paper

Read More »

64

click to vote

AAAI
2008

144views Intelligent Agents» more AAAI 2008»

A Variance Analysis for POMDP Policy Evaluation

14 years 9 months ago

Download www.cs.mcgill.ca

Partially Observable Markov Decision Processes have been studied widely as a model for decision making under uncertainty, and a number of methods have been developed to find the s...

Mahdi Milani Fard, Joelle Pineau, Peng Sun

claim paper

Read More »

65

click to vote

PRICAI
2000
Springer

128views Artificial Intelligence» more PRICAI 2000»

A POMDP Approximation Algorithm That Anticipates the Need to Observe

14 years 11 months ago

Download eecs.oregonstate.edu

This paper introduces the even-odd POMDP, an approximation to POMDPs in which the world is assumed to be fully observable every other time step. The even-odd POMDP can be converte...

Valentina Bayer Zubek, Thomas G. Dietterich

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers