In this paper, we present a reinforcement learning approach for mapping natural language instructions to sequences of executable actions. We assume access to a reward function tha...
S. R. K. Branavan, Harr Chen, Luke S. Zettlemoyer,...
We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Simi...
We study the problem of an apprentice learning to behave in an environment with an unknown reward function by observing the behavior of an expert. We follow on the work of Abbeel ...
In this paper the application of reinforcement learning to Tetris is investigated, particulary the idea of temporal difference learning is applied to estimate the state value funct...
The precise specification of reward functions for Markov decision processes (MDPs) is often extremely difficult, motivating research into both reward elicitation and the robust so...
In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there i...
David L. Roberts, Sooraj Bhat, Kenneth St. Clair, ...
We present a method for inferring the behavior styles of character controllers from a small set of examples. We show that a rich set of behavior variations can be captured by dete...
— This paper describes a general approach for the unsupervised learning of behaviors in a behavior-based robot. The key idea is to formalize a behavior produced by a Motor Map dr...
Paolo Arena, Luigi Fortuna, Mattia Frasca, Luca Pa...
Abstract. Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, w...
The paper explores a very simple agent design method called Q-decomposition, wherein a complex agent is built from simpler subagents. Each subagent has its own reward function and...