It is known that the complexity of the reinforcement learning algorithms, such as Q-learning, may be exponential in the number of environment’s states. It was shown, however, th...
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complet...
Abstract. We investigate the problem of using function approximation in reinforcement learning where the agent’s policy is represented as a classifier mapping states to actions....
Reinforcement learning is an effective technique for learning action policies in discrete stochastic environments, but its efficiency can decay exponentially with the size of the ...
Temporal difference (TD) learning methods [22] have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been...