Search Sciweavers | Sciweavers

91 search results - page 10 / 19

» Parameter-exploring policy gradients

148

click to vote

ICML
2000
IEEE

126views Machine Learning» more ICML 2000»

Reinforcement Learning in POMDP's via Direct Gradient Ascent

16 years 6 months ago

Download reference.kfupm.edu.sa

This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...

Jonathan Baxter, Peter L. Bartlett

claim paper

Read More »

163

click to vote

IOR
2011

107views more IOR 2011»

Information Collection on a Graph

15 years 29 days ago

Download www.castlelab.princeton.edu

We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to reﬁne Bayesian estimates of individual edge values i...

Ilya O. Ryzhov, Warren B. Powell

claim paper

Read More »

162

click to vote

NIPS
2007

164views Information Technology» more NIPS 2007»

Incremental Natural Actor-Critic Algorithms

15 years 7 months ago

Download books.nips.cc

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...

Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...

claim paper

Read More »

195

click to vote

GECCO
2009
Springer

162views Optimization» more GECCO 2009»

Uncertainty handling CMA-ES for reinforcement learning

15 years 3 months ago

Download www.neuroinformatik.ruhr-uni-bochum.de

The covariance matrix adaptation evolution strategy (CMAES) has proven to be a powerful method for reinforcement learning (RL). Recently, the CMA-ES has been augmented with an ada...

Verena Heidrich-Meisner, Christian Igel

claim paper

Read More »

171

click to vote

INFOCOM
1995
IEEE

122views Communications» more INFOCOM 1995»

Complexity of Gradient Projection Method for Optimal Routing in Data Networks

15 years 9 months ago

Download www.cs.ou.edu

—In this paper, we derive a time-complexity bound for the gradient projection method for optimal routing in data networks. This result shows that the gradient projection algorith...

Wei Kang Tsai, John K. Antonio, Garng M. Huang

claim paper

Read More »

« Prev « First page 10 / 19 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers