Exploiting locality of interactions using a policy-gradient approach in multiagent learning

14 years 2 months ago

Download gaips.inesc-id.pt

In this paper, we propose a policy gradient reinforcement learning algorithm to address transition-independent Dec-POMDPs. This approach aims at implicitly exploiting the locality of interaction observed in many practical problems. Our algorithms can be described by an actor-critic architecture: the actor component combines natural gradient updates with a varying learning rate; the critic uses only local information to maintain a belief over the joint state-space, and evaluates the current policy as a function of this belief using compatible function approximation. In order to speed the convergence of the algorithm, we use an optimistic initialization of the policy that relies on a fully observable, single agent model of the problem. We illustrate our approach in some simple application problems.

Francisco S. Melo

Real-time Traffic

Artificial Intelligence | Compatible Function Approximation | ECAI 2008 | Natural Gradient Updates | Policy Gradient Reinforcement |

claim paper

Post Info
More Details (n/a)

Added	19 Oct 2010
Updated	19 Oct 2010
Type	Conference
Year	2008
Where	ECAI
Authors	Francisco S. Melo

Comments (0)

Sciweavers

Exploiting locality of interactions using a policy-gradient approach in multiagent learning

Artificial Intelligence | Compatible Function Approximation | ECAI 2008 | Natural Gradient Updates | Policy Gradient Reinforcement |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers