Gradient Descent for General Reinforcement Learning

14 years 27 days ago

Download www.ri.cmu.edu

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MDPs. These include Qlearning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policysearch and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (VAPS) algorithm. And these algorithms converge for POMDPs without requiring a prop...

Leemon C. Baird III, Andrew W. Moore

Real-time Traffic

Algorithms | NIPS 1998 | NIPS 2007 | Reinforcement Learning | Value-based Algorithms |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	1998
Where	NIPS
Authors	Leemon C. Baird III, Andrew W. Moore

Comments (0)

Sciweavers

Gradient Descent for General Reinforcement Learning

Algorithms | NIPS 1998 | NIPS 2007 | Reinforcement Learning | Value-based Algorithms |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers