Search Sciweavers | Sciweavers

133 search results - page 3 / 27

» Hierarchical Policy Gradient Algorithms

182

click to vote

AAAI
2011

145views Intelligent Agents» more AAAI 2011»

Policy Gradient Planning for Environmental Decision Making with Existing Simulators

14 years 6 months ago

Download www.cs.ubc.ca

In environmental and natural resource planning domains actions are taken at a large number of locations over multiple time periods. These problems have enormous state and action s...

Mark Crowley, David Poole

claim paper

Read More »

169

click to vote

IJCAI
2001

163views Artificial Intelligence» more IJCAI 2001»

Exploiting Multiple Secondary Reinforcers in Policy Gradient Reinforcement Learning

15 years 7 months ago

Download www.cs.colorado.edu

Most formulations of Reinforcement Learning depend on a single reinforcement reward value to guide the search for the optimal policy solution. If observation of this reward is rar...

Gregory Z. Grudic, Lyle H. Ungar

claim paper

Read More »

163

click to vote

ECML
2007
Springer

192views Machine Learning» more ECML 2007»

Policy Gradient Critics

16 years 6 days ago

Download www.idsia.ch

We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...

Daan Wierstra, Jürgen Schmidhuber

claim paper

Read More »

169

click to vote

NN
2010
Springer

125views Neural Networks» more NN 2010»

Parameter-exploring policy gradients

15 years 4 months ago

Download www.kyb.mpg.de

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in paramet...

Frank Sehnke, Christian Osendorfer, Thomas Rü...

claim paper

Read More »

145

click to vote

ICML
2009
IEEE

131views Machine Learning» more ICML 2009»

Monte-Carlo simulation balancing

16 years 6 months ago

Download www.cs.ualberta.ca

In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation polic...

David Silver, Gerald Tesauro

claim paper

Read More »

« Prev « First page 3 / 27 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers