Incremental Natural Actor-Critic Algorithms

15 years 8 months ago

Download books.nips.cc

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or inﬁnite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods...

Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha

Real-time Traffic

Gradient | Information Technology | NIPS 2007 | Reinforcement Learning | Temporal Difference Learning |

claim paper

» Applying the Episodic Natural ActorCritic Architecture to Motor Primitive Learning

» Natural ActorCritic

» Ensemble Algorithms in Reinforcement Learning

» Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Lang...

» Reinforcement Learning for Parameterized Motor Primitives

» An Incremental Hausdorff Distance Calculation Algorithm

» Maintaining distributed logic programs incrementally

» Bayesian update of dialogue state A POMDP framework for spoken dialogue systems

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2007
Where	NIPS
Authors	Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee

Comments (0)

Sciweavers

Incremental Natural Actor-Critic Algorithms

Gradient | Information Technology | NIPS 2007 | Reinforcement Learning | Temporal Difference Learning |

Explore & Download

Productivity Tools

Sciweavers