It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergen...
Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of "bootstrapped" return estimates to make effi...
To appear in: G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, MIT Press, Cambridge MA, 1995. A straightforward approach to t...
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes s...
Multi-Agent Reinforcement Learning (MARL) algorithms suffer from slow convergence and even divergence, especially in large-scale systems. In this work, we develop a supervision fr...
Chongjie Zhang, Sherief Abdallah, Victor R. Lesser