Search Sciweavers | Sciweavers

1912 search results - page 155 / 383

» Optimizing interconnection policies

128

Voted

NIPS
2001

144views Information Technology» more NIPS 2001»

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

15 years 5 months ago

Download jmlr.csail.mit.edu

Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...

Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...

claim paper

Read More »

129

Voted

JMLR
2006

143views more JMLR 2006»

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

15 years 3 months ago

Download www.aaai.org

We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...

Rémi Munos

claim paper

Read More »

118

Voted

TON
2008

95views more TON 2008»

Integration of explicit effective-bandwidth-based QoS routing with best-effort routing

15 years 3 months ago

Download www.iust.ac.ir

This paper presents a methodology for protecting low-priority best-effort (BE) traffic in a network domain that provides both virtual-circuit routing with bandwidth reservation for...

Stephen L. Spitler, Daniel C. Lee

claim paper

Read More »

134

Voted

INFOCOM
2010
IEEE

180views Communications» more INFOCOM 2010»

Change Management in Enterprise IT Systems: Process Modeling and Capacity-optimal Scheduling

15 years 2 months ago

Download www.seas.upenn.edu

Abstract—We provide a formal model for the Change Management process for Enterprise IT systems, and develop change scheduling algorithms that seek to attain the “change capacit...

Praveen Kumar Muthuswamy, Koushik Kar, Sambit Sahu...

claim paper

Read More »

151

click to vote

COLT
2010
Springer

191views Machine Learning» more COLT 2010»

Best Arm Identification in Multi-Armed Bandits

15 years 1 months ago

Download www.di.ens.fr

We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optim...

Jean-Yves Audibert, Sébastien Bubeck, R&eac...

claim paper

Read More »

« Prev « First page 155 / 383 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers