This paper examines, by argument, the dynamics of sequences of behavioural choices made, when non-cooperative restricted-memory agents learn in partially observable stochastic gam...
We introduce the ALeRT (Action-dependent Learning Rates with Trends) algorithm that makes two modifications to the learning rate and one change to the exploration rate of traditio...
Maria Cutumisu, Duane Szafron, Michael H. Bowling,...
We apply CMA-ES, an evolution strategy with covariance matrix adaptation, and TDL (Temporal Difference Learning) to reinforcement learning tasks. In both cases these algorithms se...
Abstract In this paper we address the problem of simultaneous learning and coordination in multiagent Markov decision problems (MMDPs) with infinite state-spaces. We separate this ...
We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary ...