Particle Filter-based Policy Gradient in POMDPs

15 years 8 months ago

Download eprints.pascal-network.org

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency.

Pierre-Arnaud Coquelin, Romain Deguest, Rém

Real-time Traffic

Information Technology | NIPS 2008 | Parameterized Policy Optimization | Partially Observable Markov Decision Process | Sophisticated Fd Method |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2008
Where	NIPS
Authors	Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos

Comments (0)

Sciweavers

Particle Filter-based Policy Gradient in POMDPs

Information Technology | NIPS 2008 | Parameterized Policy Optimization | Partially Observable Markov Decision Process | Sophisticated Fd Method |

Explore & Download

Productivity Tools

Sciweavers