Reinforcement Learning for Average Reward Zero-Sum Games

14 years 8 months ago

Download www.ece.mcgill.ca

Abstract. We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The ﬁrst is based on relative Q-learning and the second on Q-learning for stochastic shortest path games. Convergence is proved using the ODE (Ordinary Diﬀerential Equation) method. We further discuss the case where not all the actions are played by the opponent with comparable frequencies and present an algorithm that converges to the optimal Q-function, given the observed play of the opponent.

Shie Mannor

Real-time Traffic

COLT 2004 | Ordinary Diﬀerential Equation | Stochastic Shortest Path | Zerosum Stochastic Games |

claim paper

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	COLT
Authors	Shie Mannor

Comments (0)

Sciweavers

Reinforcement Learning for Average Reward Zero-Sum Games

COLT 2004 | Ordinary Diﬀerential Equation | Stochastic Shortest Path | Zerosum Stochastic Games |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers