To cope with large scale, agents are usually organized in a network such that an agent interacts only with its immediate neighbors in the network. Reinforcement learning techniques have been commonly used to optimize agents local policies in such a network because they require little domain knowledge and can be fully distributed. However, all of the previous work assumed the underlying network was fixed throughout the learning process. This assumption was important because the underlying network defines the learning context of each agent. In particular, the set of actions and the state space for each agent is defined in terms of the agent’s neighbors. If agents dynamically change the underlying network structure (also called self-organizing) during learning, then one needs a mechanism for transferring what agents have learned so far before (in the old network structure) to their new learning context (in the new network structure). In this work we develop a novel self-organization...
Sherief Abdallah, Victor R. Lesser