Sciweavers

ECAI
2008
Springer

Exploiting locality of interactions using a policy-gradient approach in multiagent learning

14 years 1 months ago
Exploiting locality of interactions using a policy-gradient approach in multiagent learning
In this paper, we propose a policy gradient reinforcement learning algorithm to address transition-independent Dec-POMDPs. This approach aims at implicitly exploiting the locality of interaction observed in many practical problems. Our algorithms can be described by an actor-critic architecture: the actor component combines natural gradient updates with a varying learning rate; the critic uses only local information to maintain a belief over the joint state-space, and evaluates the current policy as a function of this belief using compatible function approximation. In order to speed the convergence of the algorithm, we use an optimistic initialization of the policy that relies on a fully observable, single agent model of the problem. We illustrate our approach in some simple application problems.
Francisco S. Melo
Added 19 Oct 2010
Updated 19 Oct 2010
Type Conference
Year 2008
Where ECAI
Authors Francisco S. Melo
Comments (0)