Agents that operate in a multi-agent system can benefit significantly from adapting to other agents while interacting with them. This work presents a general architecture for a modelbased learning strategy combined with an exploration strategy. This combination enables adaptive agents to learn models of their rivals and to explore their behavior for exploitation in future encounters. We report experimental results in the Iterated Prisoner’s Dilemma domain, demonstrating the superiority of the model-based learning agent over non-adaptive agents and over reinforcement-learning agents. The Experimental results also show that exploration can improve the performance of a modelbased agent significantly.