An intrinsic reward mechanism for efficient exploration

16 years 7 months ago

Download www-anw.cs.umass.edu

How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exploit later? We formulate this problem as a Markov Decision Process by explicitly modeling the internal state of the agent and propose a principled heuristic for its solution. We present experimental results in a number of domains, also exploring the algorithm's use for learning a policy for a skill given its reward function--an important but neglected component of skill discovery.

Özgür Simsek, Andrew G. Barto

Real-time Traffic

ICML 2006 | Machine Learning | Markov Decision Process | Reinforcement Learning Agent | Skill Discovery |

claim paper

» Cortical network reorganization guided by sensory input features

» RMAX A General Polynomial Time Algorithm for NearOptimal Reinforcement Learning

» Hardwaresoftware support for adaptive workstealing in onchip multiprocessor

» A Study of Adaptive Locomotive Behaviors of a Biped Robot Patterns Generation and Classifi...

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2006
Where	ICML
Authors	Özgür Simsek, Andrew G. Barto

Comments (0)

Sciweavers

An intrinsic reward mechanism for efficient exploration

ICML 2006 | Machine Learning | Markov Decision Process | Reinforcement Learning Agent | Skill Discovery |

Explore & Download

Productivity Tools

Sciweavers