Filtered Reinforcement Learning

14 years 8 months ago

Download eprints.pascal-network.org

Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly, or using a discounting model that assigns exponentially more credit to recent actions. This paper demonstrates an alternative approach to temporal credit assignment, taking advantage of exact or approximate prior information about correct credit assignment. Inﬁnite impulse response (IIR) ﬁlters are used to model credit assignment information. IIR ﬁlters generalise exponentially discounting eligibility traces to arbitrary credit assignment models. This approach can be applied to any RL algorithm that employs an eligibility trace. The use of IIR credit assignment ﬁlters is explored using both the GPOMDP policy-gradient algorithm and the Sarsa(λ) temporal-diﬀerence algorithm. A drop in bias and variance of value or gradient estimates is demonstrated, resulting in faster convergence...

Douglas Aberdeen

Real-time Traffic

Credit Assignment | ECML 2004 | IIR Credit Assignment | Temporal Credit Assignment |

claim paper

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	ECML
Authors	Douglas Aberdeen

Comments (0)

Sciweavers

Filtered Reinforcement Learning

Credit Assignment | ECML 2004 | IIR Credit Assignment | Temporal Credit Assignment |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers