Sciweavers

173

NECO
2010

97views more NECO 2010»

Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning

15 years 5 months ago

Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...

Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers