Decentralized reinforcement learning (DRL) has been applied to a number of distributed applications. However, one of the main challenges faced by DRL is its convergence. Previous work has shown that hierarchically organizational control is an effective way of coordinating DRL to improve its speed, quality, and likelihood of convergence. In this paper, we develop a distributed, negotiation-based approach to dynamically forming such hierarchical organizations. To reduce the complexity of coordinating DRL, our self-organization approach groups strongly-interacting learning agents together, whose exploration strategies are coordinated by one supervisor. We formalize this idea by characterizing interactions among agents in a decentralized Markov Decision Process model and defining and analyzing a measure that explicitly captures the strength of such interactions. Experimental results show that our dynamically evolving organizations outperform predefined organizations for coordinating DRL. ...
Chongjie Zhang, Victor R. Lesser, Sherief Abdallah