This paper examines, by argument, the dynamics of sequences of behavioural choices made, when non-cooperative restricted-memory agents learn in partially observable stochastic games. These sequences of combined agent strategies (joint-policies) can be thought of as a walk through the space of all possible joint-policies. We argue that this walk, while containing random elements, is also driven by each agent's drive to improve their current situation at each point, and posit a learning pressure field across policy space to represent this drive. Different learning choices may skew this learning pressure, and affect the simultaneous joint learning of multiple agents. Motivation Multi-Agent Stochastic Processes are becoming increasingly popular as a modelling paradigm. Game theoretic approaches commonly rely on the participating agents having full access to the process dynamics in advance, and then solve to find the best solution analytically, but with large problems this approach is...