Sciweavers

ATAL
2009
Springer

Online exploration in least-squares policy iteration

14 years 6 months ago
Online exploration in least-squares policy iteration
One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring large MDPs by integrating a powerful exploration technique, Rmax, into a state-of-the-art learning algorithm, least-squares policy iteration (LSPI). This approach combines the strengths of both methods, and has shown its effectiveness and superiority over LSPI with two other popular exploration rules in several benchmark problems. Categories and Subject Descriptors I.2.6 [Computing Methodologies]: Artificial IntelligenceLearning; I.2.8 [Computing Methodologies]: Artificial IntelligenceProblem Solving, Control Methods, and Search General Terms Algorithms Keywords Least-Squares Policy Iteration (LSPI), Exploration, PACMDP, Markov Decision Processes, Reinforcement Learning
Lihong Li, Michael L. Littman, Christopher R. Mans
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where ATAL
Authors Lihong Li, Michael L. Littman, Christopher R. Mansley
Comments (0)