Combining manual feedback with subsequent MDP reward signals for reinforcement learning

15 years 7 months ago

Download www.cs.utexas.edu

As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, tamer does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper leverages the fast learning exhibited within the tamer framework to hasten a reinforcement learning (RL) algorithm's climb up the learning curve, effectively demonstrating that human reinforcement a...

W. Bradley Knox, Peter Stone

Real-time Traffic

ATAL 2010 | Human | Intelligent Agents | Mdp Reward | Reinforcement Learning |

claim paper

» Learning and Decision Making in Human During a Game of Matching Pennies

» Interactively shaping agents via human reinforcement the TAMER framework

Post Info
More Details (n/a)

Added	08 Nov 2010
Updated	08 Nov 2010
Type	Conference
Year	2010
Where	ATAL
Authors	W. Bradley Knox, Peter Stone

Comments (0)

Sciweavers

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

ATAL 2010 | Human | Intelligent Agents | Mdp Reward | Reinforcement Learning |

Explore & Download

Productivity Tools

Sciweavers