We define the problem of inferring a “mixture of Markov chains” based on observing a stream of interleaved outputs from these chains. We show a sharp characterization of the i...
We give the first polynomial time prediction strategy for any PAC-learnable class C that probabilistically predicts the target with mistake probability poly(log(t)) t = ˜O 1 t w...
Abstract. We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the ...