We adopt the decision-theoretic principle of expected utility maximization as a paradigm for designing autonomous rational agents operating in multi-agent environments. We use the formalism of partially observable Markov decision processes and generalize it to include the presence of other agents. Under common assumptions, beliefstate MDP can be defined using agents’ beliefs that include the agent’s knowledge about the environment and about the other agents, including their knowledge about others’ states of knowledge. The resulting solution corresponds to what has been called the decision-theoretic approach to game theory. Our approach complements the more traditional game-theoretic approach based on equilibria. Equilibria may be non-unique and do not capture off-equilibrium behaviors. Our approach seeks to avoid these problems, but does so at the cost of having to represent, process and continually update the complex nested state of agent’s knowledge.
Piotr J. Gmytrasiewicz