Decentralized partially observable Markov decision process (DEC-POMDP) is an approach to model multi-robot decision making problems under uncertainty. Since it is NEXP-complete there is no efficient exact algorithm to solve these problems and in spite of the attention it has taken recently, so far only a few approximate solutions that can solve small problems have been proposed. In this study, we offer a novel approximate solution algorithm for DEC-POMDP problems using evolution strategies, and a novel approach to approximately calculate the fitness of the chromosomes which correspond to the expected reward. We also propose a new problem which is a more complex, modified version of the grid meeting problem and solve it. Our results show that our algorithm is scalable and we can solve problems that have more states than the problems attempted in previous studies. Keywords Evolution strategies
Baris Eker, H. Levent Akin