Investigating practical, linear temporal difference learning

8 years 9 months ago

Download homes.soic.indiana.edu

Oﬀ-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policyevaluation algorithms that ﬁll a longstanding algorithmic void in reinforcement learning: combining robustness to oﬀpolicy sampling, function approximation, linear complexity, and temporal diﬀerence (TD) updates. This paper contains two main contributions. First, we derive two new hybrid TD policy-evaluation algorithms, which ﬁll a gap in this collection of algorithms. Second, we perform an empirical comparison to elicit which of these new linear TD methods should be preferred in diﬀerent situations, and make concrete suggestions about practical use. Keywords Reinforcement learning; temporal diﬀerence learning; oﬀpolicy learning

Adam M. White, Martha White

Real-time Traffic

CORR 2016 | Education |

claim paper

Post Info
More Details (n/a)

Added	31 Mar 2016
Updated	31 Mar 2016
Type	Journal
Year	2016
Where	CORR
Authors	Adam M. White, Martha White

Comments (0)

Sciweavers

Investigating practical, linear temporal difference learning

CORR 2016 | Education |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers