— Multi-agent systems (MAS) are a field of study of growing interest in a variety of domains such as robotics or distributed controls. The article focuses on decentralized reinforcement learning (RL) in cooperative MAS, where a team of independent learning robots (IL) try to coordinate their individual behavior to reach a coherent joint behavior. We assume that each robot has no information about its teammates’ actions. To date, RL approaches for such ILs did not guarantee convergence to the optimal joint policy in scenarios where the coordination is difficult. We report an investigation of existing algorithms for the learning of coordination in cooperative MAS, and suggest a Q-Learning extension for ILs, called Hysteretic Q-Learning. This algorithm does not require any additional communication between robots. Its advantages are showing off and compared to other methods on various applications : bimatrix games, collaborative ball balancing task and pursuit domain.
Laëtitia Matignon, Guillaume J. Laurent, Nadi