Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning

13 years 11 months ago

Download www.kyb.tuebingen.mpg.de

Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution which corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate γ for the value functions close to 1, these algorithms do not permit γ to

Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto

Real-time Traffic

Average Reward Gradient | NECO 2010 | Policy Gradient Reinforcement | Policy Parameter |

claim paper

Post Info
More Details (n/a)

Added	29 Jan 2011
Updated	29 Jan 2011
Type	Journal
Year	2010
Where	NECO
Authors	Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Jan Peters, Kenji Doya

Comments (0)

Sciweavers

Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning

Average Reward Gradient | NECO 2010 | Policy Gradient Reinforcement | Policy Parameter |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers