Sciweavers

109 search results - page 11 / 22
» Policy teaching through reward function learning
Sort
View
NECO
2007
258views more  NECO 2007»
13 years 7 months ago
Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity
The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spiketiming-dependent plasticity (...
Razvan V. Florian
ATAL
2008
Springer
13 years 9 months ago
Sigma point policy iteration
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
ICML
2002
IEEE
14 years 8 months ago
Reinforcement Learning and Shaping: Encouraging Intended Behaviors
We explore dynamic shaping to integrate our prior beliefs of the final policy into a conventional reinforcement learning system. Shaping provides a positive or negative artificial...
Adam Laud, Gerald DeJong
COLT
2008
Springer
13 years 9 months ago
Adapting to a Changing Environment: the Brownian Restless Bandits
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
Aleksandrs Slivkins, Eli Upfal
IJCNN
2008
IEEE
14 years 2 months ago
Uncertainty propagation for quality assurance in Reinforcement Learning
— In this paper we address the reliability of policies derived by Reinforcement Learning on a limited amount of observations. This can be done in a principled manner by taking in...
Daniel Schneegaß, Steffen Udluft, Thomas Mar...