Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intra...
Pedro Alejandro Ortega, Daniel Alexander Braun, Si...
In this paper, we confront the problem of applying reinforcement learning to agents that perceive the environment through many sensors and that can perform parallel actions using ...
Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn f...
Tobias Larsen, David S. Leslie, Edmund J. Collins,...
Distributed-air-jet MEMS-based systems have been proposed to manipulate small parts with high velocities and without any friction problems. The control of such distributed systems ...
Abstract— Reinforcement learning algorithms have been successfully applied in robotics to learn how to solve tasks based on reward signals obtained during task execution. These r...
Reinforcement learning is based on exploration of the environment and receiving reward that indicates which actions taken by the agent are good and which ones are bad. In many app...
A new training algorithm is presented for delayed reinforcement learning problems that does not assume the existence of a critic model and employs the polytope optimization algorit...
In reinforcement learning, an agent tries to learn a policy, i.e., how to select an action in a given state of the environment, so that it maximizes the total amount of reward it ...
Computer models can be used to investigate the role of emotion in learning. Here we present EARL, our framework for the systematic study of the relation between emotion, adaptation...