Eligibility Traces for Off-Policy Policy Evaluation

15 years 1 months ago

Download www.cs.ualberta.ca

Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference methods. Here we generalize eligibility traces to off-policy learning, in which one learns about a policy different from the policy that generates the data. Off-policy methods can greatly multiply learning, as many policies can be learned about from the same data stream, and have been identified as particularly useful for learning about subgoals and temporally extended macro-actions. In this paper we consider the off-policy version of the policy evaluation problem, for which only one eligibility trace algorithm is known, a Monte Carlo method. We analyze and compare this and four new eligibility trace algorithms, emphasizing their relationships to the classical statistical technique known as importance sampling. Our main results are 1) to establish the consistency and bias properties of the new methods and 2) to e...

Doina Precup, Richard S. Sutton, Satinder P. Singh

Real-time Traffic

Eligibility Trace Algorithms | ICML 2000 | Machine Learning | Monte Carlo Methods | Policy Evaluation Problem |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2000
Where	ICML
Authors	Doina Precup, Richard S. Sutton, Satinder P. Singh

Comments (0)

Sciweavers

Eligibility Traces for Off-Policy Policy Evaluation

Eligibility Trace Algorithms | ICML 2000 | Machine Learning | Monte Carlo Methods | Policy Evaluation Problem |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers