We consider the problem of finding optimal strategies in infinite extensive form games with incomplete information that are repeatedly played. This problem is still open in literature. Game theory allows to compute equilibrium strategies exclusively in presence of perfectly rational agents and a common prior, but these assumptions are unrealistic in several practical settings. When these assumptions do not hold, the resort to learning techniques is common. Nevertheless, learning literature does not provide a mature solution for the problem above. In this paper we present a novel learning principle that aims at avoiding oscillations in the agents’ strategies induced by the presence of concurrent learners. We apply our algorithm in alternating-offers bargaining with deadlines, and we experimentally evaluate it showing that using this principle self-interested reinforcement learning algorithms can improve their convergence time.