Reducing policy degradation in neuro-dynamic programming

15 years 8 months ago

Download ml.informatik.uni-freiburg.de

We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to reflect whether it is better to cease learning, and thus obtains more stable learning results.

Thomas Gabel, Martin Riedmiller

Real-time Traffic

ESANN 2006 | ESANN 2007 | Reinforcement Learning | Reinforcement Learning Method | State-action Value Functions |

claim paper

» Static and Dynamic TemperatureAware Scheduling for Multiprocessor SoCs

» SoftOLP Improving Hardware Cache Performance through SoftwareControlled ObjectLevel Partit...

» MemScale active lowpower modes for main memory

» Transparent Threads Resource Sharing in SMT Processors for High SingleThread Performance

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2006
Where	ESANN
Authors	Thomas Gabel, Martin Riedmiller

Comments (0)

Sciweavers

Reducing policy degradation in neuro-dynamic programming

ESANN 2006 | ESANN 2007 | Reinforcement Learning | Reinforcement Learning Method | State-action Value Functions |

Explore & Download

Productivity Tools

Sciweavers