Sciweavers

CORR
2010
Springer

Dynamic Policy Programming

13 years 11 months ago
Dynamic Policy Programming
In this paper, we consider the problem of planning and learning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative direct policysearch approach, called dynamic policy programming (DPP). DPP is, to the best of our knowledge, the first convergent direct policy-search method that uses a Bellman-like iteration technique and at the same time is compatible with function approximation. For the tabular case, we prove that DPP converges asymptotically to the optimal policy. We numerically compare the performance of DPP to other state-of-the-art approximate dynamic programming methods on the mountain-car problem with linear function approximation and Gaussian basis functions. We observe that, unlike other approximate dynamic programming methods, DPP converges to a near-optimal policy, even when the basis functions are randomly placed. We conclude that DPP, combined with function approximation, asymptotically outperforms other approximate dynamic pro...
Mohammad Gheshlaghi Azar, Hilbert J. Kappen
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2010
Where CORR
Authors Mohammad Gheshlaghi Azar, Hilbert J. Kappen
Comments (0)