Multi-agent reinforcement learning (MARL) is an emerging area of research. However, it lacks two important elements: a coherent view on MARL, and a well-defined problem objective. We demonstrate these points by introducing three phenomena, social norms, teaching, and bounded rationality, which are inadequately addressed by the previous research. Based on the ideas of bounded rationality, we define a very broad class of MARL problems that are equivalent to learning in partially observable Markov decision processes (POMDPs). We show that this perspective on MARL accounts for the three missing phenomena, but also provides a well-defined objective for a learner, since POMDPs have a welldefined notion of optimality. We illustrate the concept in an empirical study, and discuss its implications for future research.