Sciweavers

ICML
2005
IEEE

Reinforcement learning with Gaussian processes

15 years 10 days ago
Reinforcement learning with Gaussian processes
Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing two pressing issues, which were not adequately treated in the original GPTD paper (Engel et al., 2003). The first is the issue of stochasticity in the state transitions, and the second is concerned with action selection and policy improvement. We present a new generative model for the value function, deduced from its relation with the discounted return. We derive a corresponding on-line algorithm for learning the posterior moments of the value Gaussian process. We also present a SARSA based extension of GPTD, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.
Yaakov Engel, Shie Mannor, Ron Meir
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2005
Where ICML
Authors Yaakov Engel, Shie Mannor, Ron Meir
Comments (0)