Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

14 years 7 months ago

Download teamcore.usc.edu

In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for dynamic environments. In general reinforcement learning situations, a stepsize parameter is decreased to zero during learning, because the environment is generally supposed to be noisy but stationary, such that the true expected rewards are ﬁxed. On the other hand, we assume that in the real world, the true expected reward changes over time and hence, the learning agent must adapt the change through continuous learning. We derive the higher-order derivatives of exponential moving average (which is used to estimate the expected values of states or actions in major reinforcement learning) using stepsize parameters. We also illustrate a mechanism to calculate these derivatives in a recursive manner. Using the mechanism, we construct a precise and ﬂexible adaptation method for the stepsize parameter in order to minimize square errors or maximize a certain criterion. The proposed method...

Itsuki Noda

Real-time Traffic

Artificial Intelligence | PRIMA 2009 | Reinforcement Learning | Stepsize Parameter | True Expected Reward |

claim paper

Post Info
More Details (n/a)

Added	27 May 2010
Updated	27 May 2010
Type	Conference
Year	2009
Where	PRIMA
Authors	Itsuki Noda

Comments (0)

Sciweavers

Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

Artificial Intelligence | PRIMA 2009 | Reinforcement Learning | Stepsize Parameter | True Expected Reward |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers