Learning to compete, compromise, and cooperate in repeated general-sum games

15 years 1 months ago

Download www.mit.edu

Learning algorithms often obtain relatively low average payoffs in repeated general-sum games between other learning agents due to a focus on myopic best-response and one-shot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (M-Qubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games.

Jacob W. Crandall, Michael A. Goodrich

Real-time Traffic

ICML 2005 | Low Average Payoffs | Machine Learning | Myopic Approach Places | Reinforcement Learning Algorithm |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2005
Where	ICML
Authors	Jacob W. Crandall, Michael A. Goodrich

Comments (0)

Sciweavers

Learning to compete, compromise, and cooperate in repeated general-sum games

ICML 2005 | Low Average Payoffs | Machine Learning | Myopic Approach Places | Reinforcement Learning Algorithm |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers