In this paper the application of reinforcement learning to Tetris is investigated, particulary the idea of temporal difference learning is applied to estimate the state value funct...
We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a mo...
In this paper we define and address the problem of safe exploration in the context of reinforcement learning. Our notion of safety is concerned with states or transitions that can ...
We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action ...
This paper extends the link between evolutionary game theory and multi-agent reinforcement learning to multistate games. In previous work, we introduced piecewise replicator dynam...