Sciweavers

179

UAI
2001

129views Artificial Intelligence» more UAI 2001»

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

15 years 8 months ago

There exist a number of reinforcement learning algorithms which learn by climbing the gradient of expected reward. Their long-run convergence has been proved, even in partially ob...

Lex Weaver, Nigel Tao

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers