Convergence and Divergence in Standard and Averaging Reinforcement Learning

14 years 7 months ago

Download igitur-archive.library.uu.nl

Although tabular reinforcement learning (RL) methods have been proved to converge to an optimal policy, the combination of particular conventional reinforcement learning techniques with function approximators can lead to divergence. In this paper we show why oﬀ-policy RL methods combined with linear function approximators can lead to divergence. Furthermore, we analyze two diﬀerent types of updates; standard and averaging RL updates. Although averaging RL will not diverge, we show that they can converge to wrong value functions. In our experiments we compare standard to averaging value iteration (VI) with CMACs and the results show that for small values of the discount factor averaging VI works better, whereas for large values of the discount factor standard VI performs better, although it does not always converge.

Marco Wiering

Real-time Traffic

ECML 2004 | Factor Averaging Vi | Function Approximators | Oﬀ-policy Rl Methods |

claim paper

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	ECML
Authors	Marco Wiering

Comments (0)

Sciweavers

Convergence and Divergence in Standard and Averaging Reinforcement Learning

ECML 2004 | Factor Averaging Vi | Function Approximators | Oﬀ-policy Rl Methods |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers