Sciweavers

128 search results - page 5 / 26
» Hierarchically Optimal Average Reward Reinforcement Learning
Sort
View
NECO
2010
97views more  NECO 2010»
13 years 6 months ago
Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...
Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto...
GECCO
2011
Springer
276views Optimization» more  GECCO 2011»
12 years 11 months ago
Evolution of reward functions for reinforcement learning
The reward functions that drive reinforcement learning systems are generally derived directly from the descriptions of the problems that the systems are being used to solve. In so...
Scott Niekum, Lee Spector, Andrew G. Barto
ICML
1998
IEEE
14 years 8 months ago
The MAXQ Method for Hierarchical Reinforcement Learning
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural seman...
Thomas G. Dietterich
CORR
2006
Springer
140views Education» more  CORR 2006»
13 years 7 months ago
Nearly optimal exploration-exploitation decision thresholds
While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds ...
Christos Dimitrakakis
ICML
2006
IEEE
14 years 8 months ago
An intrinsic reward mechanism for efficient exploration
How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exp...
Özgür Simsek, Andrew G. Barto