We consider real-time multi-agent coordination in a dynamic and uncertain domain addressing both distributed state information and partial knowledge of the common reward function. The challenge is to find functional strategies when bounded rationality hinders the ability to encompass the values of possible sample paths of the system. This paper discusses a new approach based on assigning agents to monitor portions of the reward structure for which they aggregate and propagate appropriate profiles which compactly represent relevant information used for policy modification. This approach shows promise as an alternate and potentially superior technique with respect to current decision-theoretic and scheduling approaches.
Rajiv T. Maheswaran, Craig Milo Rogers, Romeo Sanc