Sciweavers

COLT
2000
Springer

Bias-Variance Error Bounds for Temporal Difference Updates

14 years 3 months ago
Bias-Variance Error Bounds for Temporal Difference Updates
We give the first rigorous upper bounds on the error of temporal difference (td) algorithms for policy evaluation as a function of the amount of experience. These upper bounds prove exponentially fast convergence, with both the rate of convergence and the asymptote strongly dependent on the length of the backups k or the parameter . Our bounds give formal verification to the long-standing intuition that td methods are subject to a “bias-variance” trade-off, and they lead to schedules for k and  that are predicted to be better than any fixed values for these parameters. We give preliminary experimental confirmation of our theory for a version of the random walk problem.
Michael J. Kearns, Satinder P. Singh
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 2000
Where COLT
Authors Michael J. Kearns, Satinder P. Singh
Comments (0)