Extending Q-Learning to General Adaptive Multi-Agent Systems

14 years 2 months ago

Download books.nips.cc

Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.

Gerald Tesauro

Real-time Traffic

Certain Broad Categories | NIPS 2003 | NIPS 2007 | Optimal Time-varying Policies | Q-Learning Require Knowledge |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	NIPS
Authors	Gerald Tesauro

Comments (0)

Sciweavers

Extending Q-Learning to General Adaptive Multi-Agent Systems

Certain Broad Categories | NIPS 2003 | NIPS 2007 | Optimal Time-varying Policies | Q-Learning Require Knowledge |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers