Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

186

ICML
2007
IEEE

180views Machine Learning» more ICML 2007»

Bayesian actor-critic algorithms

16 years 7 months ago

Bayesian actor-critic algorithms

Download www.machinelearning.org

We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model the state-action value function as a Gaussian process, allowing Bayes' rule to be used in computing the posterior distribution over state-action value functions, conditioned on the observed data. Appropriate choices of the prior covariance (kernel) between stateaction values and of the parametrization of the policy allow us to obtain closed-form expressions for the posterior distribution of the gradient of the average discounted return with respect to the policy parameters. The posterior mean, which serves as our estimate of the policy gradient, is used to update the policy, while the posterior covariance allows us to gauge the reliability of the update.

Mohammad Ghavamzadeh, Yaakov Engel

Real-time Traffic

ICML 2007 | Machine Learning | Posterior Covariance | Posterior Distribution | Posterior Mean |

claim paper

Related Content

» A Convergent Online Single Time Scale Actor Critic Algorithm

» An RLSBased Natural ActorCritic Algorithm for Locomotion of a TwoLinked Robot Arm

» Incremental Natural ActorCritic Algorithms

» Temporal Difference Based Actor Critic Learning Convergence and Neural Implementation

» Applying the Episodic Natural ActorCritic Architecture to Motor Primitive Learning

» Natural ActorCritic

» Bayesian update of dialogue state A POMDP framework for spoken dialogue systems

» Robot docking based on omnidirectional vision and reinforcement learning

» Ensemble Algorithms in Reinforcement Learning

» Lazy Learning of Bayesian Rules

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2007
Where	ICML
Authors	Mohammad Ghavamzadeh, Yaakov Engel

Comments (0)