Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

106

JMLR
2010

favoriteEmaildiscussreport

125views more JMLR 2010»

Variational methods for Reinforcement Learning

14 years 9 months ago

Variational methods for Reinforcement Learning

Download jmlr.csail.mit.edu

We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate of the transition model does not reflect the uncertainty in the estimate of the transition model and the resulting policies may consequently lack a sufficient degree of exploration. We consider a Bayesian alternative that maintains a distribution over the transition so that the resulting policy takes into account the limited experience of the environment. The resulting algorithm is formally intractable and we discuss two approximate solution methods, Variational Bayes and Expectation Propagation.

Thomas Furmston, David Barber

Real-time Traffic

JMLR 2010 | Likelihood Point Estimate | Transition Model | Unknown Transition Distribution |

claim paper

Related Content

» Comparing evolutionary and temporal difference methods in a reinforcement learning domain

» Cooperative Behavior Acquisition for Mobile Robots in Dynamically Changing Real Worlds Via...

» Learning behavior styles with inverse reinforcement learning

» Control of exploitationexploration metaparameter in reinforcement learning

» Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

» Reinforcement learning for quasipassive dynamic walking of an unstable biped robot

» Reinforcement learning of a simple control task using the spike response model

» Hierarchical reinforcement learning with subpolicies specializing for learned subgoals

» Reducing policy degradation in neurodynamic programming

Post Info
More Details (n/a)

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	JMLR
Authors	Thomas Furmston, David Barber

Comments (0)