Search Sciweavers | Sciweavers

49 search results - page 8 / 10

» Temporal Difference and Policy Search Methods for Reinforcem...

click to vote

ICML
2005
IEEE

119views Machine Learning» more ICML 2005»

Dynamic preferences in multi-criteria reinforcement learning

14 years 8 months ago

Download www.machinelearning.org

The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made amon...

Sriraam Natarajan, Prasad Tadepalli

claim paper

Read More »

click to vote

NIPS
2001

206views Information Technology» more NIPS 2001»

Model-Free Least-Squares Policy Iteration

13 years 8 months ago

Download www.cs.duke.edu

We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. ...

Michail G. Lagoudakis, Ronald Parr

claim paper

Read More »

click to vote

ECML
2005
Springer

193views Machine Learning» more ECML 2005»

Natural Actor-Critic

14 years 1 months ago

Download www-clmc.usc.edu

This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari...

Jan Peters, Sethu Vijayakumar, Stefan Schaal

claim paper

Read More »

click to vote

ATAL
2005
Springer

130views Intelligent Agents» more ATAL 2005»

Behavior transfer for value-function-based reinforcement learning

14 years 1 months ago

Download www.cs.huji.ac.il

Temporal difference (TD) learning methods [22] have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been...

Matthew E. Taylor, Peter Stone

claim paper

Read More »

click to vote

IAT
2005
IEEE

180views Intelligent Agents» more IAT 2005»

Self-Organizing Cognitive Agents and Reinforcement Learning in Multi-Agent Environment

14 years 1 months ago

Download www3.ntu.edu.sg

This paper presents a self-organizing cognitive architecture, known as TD-FALCON, that learns to function through its interaction with the environment. TD-FALCON learns the value ...

Ah-Hwee Tan, Dan Xiao

claim paper

Read More »

« Prev « First page 8 / 10 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers