Sciweavers

95 search results - page 11 / 19
» Policy Gradients for Cryptanalysis
Sort
View
CIS
2005
Springer
14 years 1 months ago
An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm
Recently, actor-critic methods have drawn much interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy...
Jooyoung Park, Jongho Kim, Daesung Kang
ICML
2008
IEEE
14 years 8 months ago
A worst-case comparison between temporal difference and residual gradient with linear function approximation
Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...
Lihong Li
AAAI
2011
12 years 7 months ago
Differential Eligibility Vectors for Advantage Updating and Gradient Methods
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of...
Francisco S. Melo
ICML
2007
IEEE
14 years 8 months ago
Bayesian actor-critic algorithms
We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model ...
Mohammad Ghavamzadeh, Yaakov Engel
NIPS
2003
13 years 9 months ago
Distributed Optimization in Adaptive Networks
We develop a protocol for optimizing dynamic behavior of a network of simple electronic components, such as a sensor network, an ad hoc network of mobile devices, or a network of ...
Ciamac Cyrus Moallemi, Benjamin Van Roy