Search Sciweavers | Sciweavers

95 search results - page 11 / 19

» Policy Gradients for Cryptanalysis

189

click to vote

CIS
2005
Springer

129views Applied Computing» more CIS 2005»

An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm

16 years 11 days ago

Download www-clmc.usc.edu

Recently, actor-critic methods have drawn much interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy...

Jooyoung Park, Jongho Kim, Daesung Kang

claim paper

Read More »

172

click to vote

ICML
2008
IEEE

165views Machine Learning» more ICML 2008»

A worst-case comparison between temporal difference and residual gradient with linear function approximation

16 years 7 months ago

Download www.research.rutgers.edu

Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...

Lihong Li

claim paper

Read More »

194

click to vote

AAAI
2011

144views Intelligent Agents» more AAAI 2011»

Differential Eligibility Vectors for Advantage Updating and Gradient Methods

14 years 6 months ago

Download gaips.inesc-id.pt

In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of...

Francisco S. Melo

claim paper

Read More »

198

click to vote

ICML
2007
IEEE

180views Machine Learning» more ICML 2007»

Bayesian actor-critic algorithms

16 years 7 months ago

Download www.machinelearning.org

We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model ...

Mohammad Ghavamzadeh, Yaakov Engel

claim paper

Read More »

166

Voted

NIPS
2003

128views Information Technology» more NIPS 2003»

Distributed Optimization in Adaptive Networks

15 years 8 months ago

Download books.nips.cc

We develop a protocol for optimizing dynamic behavior of a network of simple electronic components, such as a sensor network, an ad hoc network of mobile devices, or a network of ...

Ciamac Cyrus Moallemi, Benjamin Van Roy

claim paper

Read More »

« Prev « First page 11 / 19 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers