Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation

14 years 2 months ago

Download eprints.pascal-network.org

Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g., when function approximation is involved). Interestingly, there is growing evidence that actor-critic approaches based on phasic dopamine signals play a key role in biological learning through cortical and basal ganglia loops. We derive a temporal difference based actor critic learning algorithm, for which convergence can be proved without assuming widely separated time scales for the actor and the critic. The approach is demonstrated by applying it to networks of spiking neurons. The established relation between phasic dopamine and the temporal difference signal lends support to the biological relevance of such algorithms.

Dotan Di Castro, Dmitry Volkinshtein, Ron Meir

Real-time Traffic

Information Technology | NIPS 2008 | Phasic Dopamine | Phasic Dopamine Signals | Temporal Difference |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	NIPS
Authors	Dotan Di Castro, Dmitry Volkinshtein, Ron Meir

Comments (0)

Sciweavers

Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation

Information Technology | NIPS 2008 | Phasic Dopamine | Phasic Dopamine Signals | Temporal Difference |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers