Policy Search by Dynamic Programming

15 years 8 months ago

Download books.nips.cc

We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a ﬁnite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

J. Andrew Bagnell, Sham Kakade, Andrew Y. Ng, Jeff

Real-time Traffic

NIPS 2003 | NIPS 2007 | Policy Search | Policy Search Algorithm | Policy Search Approach |

claim paper

» Focused RealTime Dynamic Programming for MDPs Squeezing More Out of a Heuristic

» Spaceindexed dynamic programming learning to follow trajectories

» Pathologies of temporal difference methods in approximate dynamic programming

» Taming Decentralized POMDPs Towards Efficient Policy Computation for Multiagent Settings

» A Fast Scheme to Investigate ThermalAware Scheduling Policy for Multicore Processors

» Paging and Registration in Cellular Networks Jointly Optimal Policies and an Iterative Alg...

» Securitytyped programming within dependently typed programming

» Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	NIPS
Authors	J. Andrew Bagnell, Sham Kakade, Andrew Y. Ng, Jeff G. Schneider

Comments (0)

Sciweavers

Policy Search by Dynamic Programming

NIPS 2003 | NIPS 2007 | Policy Search | Policy Search Algorithm | Policy Search Approach |

Explore & Download

Productivity Tools

Sciweavers