In this paper we focus on an important source of problem– difficulty in (online) dynamic optimization problems that has so far received significantly less attention than the tr...
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...
— This paper presents a new reinforcement learning algorithm for accelerating acquisition of new skills by real mobile robots, without requiring simulation. It speeds up Q-learni...
Abstract. We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily ove...
We consider reinforcement learning in the parameterized setup, where the model is known to belong to a parameterized family of Markov Decision Processes (MDPs). We further impose ...