Search Sciweavers | Sciweavers

1512 search results - page 178 / 303

» Qualitative reinforcement learning

170

Voted

IROS
2006
IEEE

113views Robotics» more IROS 2006»

Policy Gradient Methods for Robotics

15 years 11 months ago

Download www.cs.utah.edu

— The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-struc...

Jan Peters, Stefan Schaal

claim paper

Read More »

180

click to vote

ECML
2005
Springer

193views Machine Learning» more ECML 2005»

Natural Actor-Critic

15 years 11 months ago

Download www-clmc.usc.edu

This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari...

Jan Peters, Sethu Vijayakumar, Stefan Schaal

claim paper

Read More »

138

click to vote

ATAL
2008
Springer

104views Intelligent Agents» more ATAL 2008»

Expediting RL by using graphical structures

15 years 7 months ago

Download www.cs.washington.edu

The goal of Reinforcement learning (RL) is to maximize reward (minimize cost) in a Markov decision process (MDP) without knowing the underlying model a priori. RL algorithms tend ...

Peng Dai, Alexander L. Strehl, Judy Goldsmith

claim paper

Read More »

162

click to vote

AAAI
2006

127views Intelligent Agents» more AAAI 2006»

Modeling Human Decision Making in Cliff-Edge Environments

15 years 7 months ago

Download www.aaai.org

In this paper we propose a model for human learning and decision making in environments of repeated Cliff-Edge (CE) interactions. In CE environments, which include common daily in...

Ron Katz, Sarit Kraus

claim paper

Read More »

177

Voted

ICML
2010
IEEE

222views Machine Learning» more ICML 2010»

Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

15 years 3 months ago

Download www.icml2010.org

Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of "bootstrapped" return estimates to make effi...

Carlton Downey, Scott Sanner

claim paper

Read More »

« Prev « First page 178 / 303 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers