Sciweavers

437 search results - page 11 / 88
» Policy Gradient Critics
Sort
View
ICML
2009
IEEE
14 years 8 months ago
Monte-Carlo simulation balancing
In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation polic...
David Silver, Gerald Tesauro
JMLR
2006
143views more  JMLR 2006»
13 years 7 months ago
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...
Rémi Munos
EWRL
2008
13 years 9 months ago
Policy Learning - A Unified Perspective with Applications in Robotics
Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper,...
Jan Peters, Jens Kober, Duy Nguyen-Tuong
MICRO
2005
IEEE
123views Hardware» more  MICRO 2005»
14 years 1 months ago
A Criticality Analysis of Clustering in Superscalar Processors
Clustered machines partition hardware resources to circumvent the cycle time penalties incurred by large, monolithic structures. This partitioning introduces a long inter-cluster ...
Pierre Salverda, Craig B. Zilles
NIPS
2003
13 years 9 months ago
Bounded Finite State Controllers
We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic fini...
Pascal Poupart, Craig Boutilier