Abstract. We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the ...
Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the Real-Time Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reach...