Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

148

SIAMCO
2000

117views more SIAMCO 2000»

The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

15 years 6 months ago

The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

Download eprints.iisc.ernet.in

It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a priori assumption of stability; (iii) a proof for the first time that asynchronous adaptive critic and Q-learning algorithms are convergent for the average cost optimal control problem. Key words. stochastic approximation, ODE method, stability, asynchronous algorithms, reinforcement learning AMS subject classifications. 62L20, 93E25, 93E15 PII. S0363012997331639

Vivek S. Borkar, Sean P. Meyn

Real-time Traffic

Algorithms | Asynchronous Stochastic Approximation | SIAMCO 2000 | Stochastic Approximation Algorithms |

claim paper

Related Content

» Convergence Problems of GeneralSum Multiagent Reinforcement Learning

» A Convergent Reinforcement Learning Algorithm in the Continuous Case The FiniteElement Rei...

» Localizing Search in Reinforcement Learning

» Incremental Natural ActorCritic Algorithms

» Reinforcement Learning for Average Reward ZeroSum Games

» Convergence and Divergence in Standard and Averaging Reinforcement Learning

» On step sizes stochastic shortest paths and survival probabilities in Reinforcement Learni...

» Multiagent learning using a variable learning rate

» Tracking value function dynamics to improve reinforcement learning with piecewise linear f...

Post Info
More Details (n/a)

Added	19 Dec 2010
Updated	19 Dec 2010
Type	Journal
Year	2000
Where	SIAMCO
Authors	Vivek S. Borkar, Sean P. Meyn

Comments (0)