Regret based algorithms have been proposed to control a wide variety of multi-agent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multi-agent systems and (2) there are existing results proving that the behavior will asymptotically converge to a set of points of “no-regret” in any game. We illustrate, through a simple example, that noregret points need not reflect desirable operating conditions for a multi-agent system. Multi-agent systems often exhibit an additional structure (i.e. being “weakly acyclic”) that has not been exploited in the context of regret based algorithms. In this paper, we introduce a modification of regret based algorithms by (1) exponentially discounting the memory and (2) bringing in a notion of inertia in players’ decision process. We show how these modifications can lead to an entire class of regret based algorithm that provide almost sure convergence to a pure Nash equilibrium i...
Jason R. Marden, Gürdal Arslan, Jeff S. Shamm