Sciweavers

LAMAS
2005
Springer

Unifying Convergence and No-Regret in Multiagent Learning

14 years 5 months ago
Unifying Convergence and No-Regret in Multiagent Learning
We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive learnng rate of RVσ(t)that is explicitly dependent on time can overcome both of these assumptions. Consequently, RVσ(t)theoretically achieves (a’) convergence to near-best response against eventually stationary opponents, (b’) no-regret payoff against arbitrary opponents and (c’) convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need t...
Bikramjit Banerjee, Jing Peng
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where LAMAS
Authors Bikramjit Banerjee, Jing Peng
Comments (0)