A MAS architecture consisting of service centers is proposed. Within each service center, a mediator coordinates service delivery by allocating individual tasks to corresponding task specialist agents depending on their prior perfomance while anticipating performance of newly entering agents. By basing mediator behavior on a novel multicriteria-driven (including quality of service, deadline, reputation, cost, and user preferences) reinforcement learning algorithm, integrating the exploitation of acquired knowledge with optimal, undirected, continual exploration, adaptability to changes in agent availability and performance is ensured. The reported experiments indicate the algorithm behaves as expected and outperforms two standard approaches. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning; I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence ; D.2.11 [Software Engineering]: Software Architectures General Terms Architecture, Algorithms, ...