Co-learning is a model involving agents from a large population, who interact by playing a fixed game and update their behaviour based on previous experience and the outcome of this game. The Highest Cumulative Reward rule is an update rule which ensures the emergence of cooperation in a population of agents without centralized control, for various games and interaction topologies. We analyse the convergence rate of this rule when applied to the Iterated Prisoner's dilemma game, proving that the convergence rate is optimal when the interaction topology is a cycle and exponential when it is a complete graph.
Martin E. Dyer, Leslie Ann Goldberg, Catherine S.