Planning under uncertainty is an important and challenging problem in multiagent systems. Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is delayed by at most one time step. This model allows agents to act on their most recent (private) observation. Such an assumption is a strict generalization over having agents wait until the global information is available and is more appropriate for applications in which response time is critical. From a technical point of view, MPOMDP-DCs are quite similar to MPOMDPs. However, value function backups are significantly more costly, and naive application of incremental pruning, the core of many state-of-the-art optimal POMDP techniques, is intractable. In this paper, we show how to overcome this problem by demonstrating that comput...
Frans Adriaan Oliehoek, Matthijs T. J. Spaan