Sciweavers

16 search results - page 3 / 4
» On Average Versus Discounted Reward Temporal-Difference Lear...
Sort
View
UAI
2001
13 years 9 months ago
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning
There exist a number of reinforcement learning algorithms which learn by climbing the gradient of expected reward. Their long-run convergence has been proved, even in partially ob...
Lex Weaver, Nigel Tao
JMLR
2010
119views more  JMLR 2010»
13 years 2 months ago
A Convergent Online Single Time Scale Actor Critic Algorithm
Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their gen...
Dotan Di Castro, Ron Meir
NN
2002
Springer
13 years 7 months ago
Opponent interactions between serotonin and dopamine
Anatomical and pharmacological evidence suggests that the dorsal raphe serotonin system and the ventral tegmental and substantia nigra dopamine system may act as mutual opponents....
Nathaniel D. Daw, Sham Kakade, Peter Dayan
ICML
2007
IEEE
14 years 8 months ago
Bayesian actor-critic algorithms
We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model ...
Mohammad Ghavamzadeh, Yaakov Engel
BROADNETS
2004
IEEE
13 years 11 months ago
Efficient QoS Provisioning for Adaptive Multimedia in Mobile Communication Networks by Reinforcement Learning
The scarcity and large fluctuations of link bandwidth in wireless networks have motivated the development of adaptive multimedia services in mobile communication networks, where i...
Fei Yu, Vincent W. S. Wong, Victor C. M. Leung