Search Sciweavers | Sciweavers

128 search results - page 5 / 26

» Hierarchically Optimal Average Reward Reinforcement Learning

click to vote

NECO
2010

97views more NECO 2010»

Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning

13 years 6 months ago

Download www.kyb.tuebingen.mpg.de

Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...

Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto...

claim paper

Read More »

click to vote

GECCO
2011
Springer

276views Optimization» more GECCO 2011»

Evolution of reward functions for reinforcement learning

12 years 11 months ago

Download hampshire.edu

The reward functions that drive reinforcement learning systems are generally derived directly from the descriptions of the problems that the systems are being used to solve. In so...

Scott Niekum, Lee Spector, Andrew G. Barto

claim paper

Read More »

click to vote

ICML
1998
IEEE

268views Machine Learning» more ICML 1998»

The MAXQ Method for Hierarchical Reinforcement Learning

14 years 8 months ago

Download www.cs.ualberta.ca

This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural seman...

Thomas G. Dietterich

claim paper

Read More »

click to vote

CORR
2006
Springer

140views Education» more CORR 2006»

Nearly optimal exploration-exploitation decision thresholds

13 years 7 months ago

Download www.idiap.ch

While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds ...

Christos Dimitrakakis

posted by olethros

Read More »

click to vote

ICML
2006
IEEE

142views Machine Learning» more ICML 2006»

An intrinsic reward mechanism for efficient exploration

14 years 8 months ago

Download www-anw.cs.umass.edu

How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exp...

Özgür Simsek, Andrew G. Barto

claim paper

Read More »

« Prev « First page 5 / 26 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers