Search Sciweavers | Sciweavers

1233 search results - page 103 / 247

» Feudal Reinforcement Learning

154

click to vote

ML
2002
ACM

100views Machine Learning» more ML 2002»

Structure in the Space of Value Functions

15 years 5 months ago

Download www.gatsby.ucl.ac.uk

Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental ...

David J. Foster, Peter Dayan

claim paper

Read More »

165

click to vote

SMC
2007
IEEE

102views Control Systems» more SMC 2007»

An improved immune Q-learning algorithm

16 years 13 days ago

Download web2.uwindsor.ca

—Reinforcement learning is a framework in which an agent can learn behavior without knowledge on a task or an environment by exploration and exploitation. Striking a balance betw...

Zhengqiao Ji, Q. M. Jonathan Wu, Maher A. Sid-Ahme...

claim paper

Read More »

174

click to vote

IROS
2006
IEEE

113views Robotics» more IROS 2006»

Policy Gradient Methods for Robotics

16 years 6 days ago

Download www.cs.utah.edu

— The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-struc...

Jan Peters, Stefan Schaal

claim paper

Read More »

193

click to vote

ECML
2005
Springer

193views Machine Learning» more ECML 2005»

Natural Actor-Critic

15 years 11 months ago

Download www-clmc.usc.edu

This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari...

Jan Peters, Sethu Vijayakumar, Stefan Schaal

claim paper

Read More »

185

click to vote

ICML
2010
IEEE

222views Machine Learning» more ICML 2010»

Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

15 years 4 months ago

Download www.icml2010.org

Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of "bootstrapped" return estimates to make effi...

Carlton Downey, Scott Sanner

claim paper

Read More »

« Prev « First page 103 / 247 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers