Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Lyapunov design methods are used widely in control engineering to design controllers that achieve qualitative objectives, such as stabilizing a system or maintaining a system'...
Hierarchical reinforcement learning is a general framework which attempts to accelerate policy learning in large domains. On the other hand, policy gradient reinforcement learning...
– This paper describes two experiments with supervised reinforcement learning (RL) on a real, mobile robot. Two types of experiments were preformed. One tests the robot’s relia...
Abstract: Classification-based reinforcement learning (RL) methods have recently been proposed as an alternative to the traditional value-function based methods. These methods use...