Sciweavers

437 search results - page 11 / 88
» Policy Gradient Critics
Sort
View
ICML
2009
IEEE
16 years 3 months ago
Monte-Carlo simulation balancing
In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation polic...
David Silver, Gerald Tesauro
122
Voted
JMLR
2006
143views more  JMLR 2006»
15 years 2 months ago
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...
Rémi Munos
142
Voted
EWRL
2008
15 years 4 months ago
Policy Learning - A Unified Perspective with Applications in Robotics
Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper,...
Jan Peters, Jens Kober, Duy Nguyen-Tuong
MICRO
2005
IEEE
123views Hardware» more  MICRO 2005»
15 years 8 months ago
A Criticality Analysis of Clustering in Superscalar Processors
Clustered machines partition hardware resources to circumvent the cycle time penalties incurred by large, monolithic structures. This partitioning introduces a long inter-cluster ...
Pierre Salverda, Craig B. Zilles
NIPS
2003
15 years 4 months ago
Bounded Finite State Controllers
We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic fini...
Pascal Poupart, Craig Boutilier