Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

108

ICML
2009
IEEE

favoriteEmaildiscussreport

155views Machine Learning» more ICML 2009»

Near-Bayesian exploration in polynomial time

16 years 2 months ago

Near-Bayesian exploration in polynomial time

Download ai.stanford.edu

We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform -close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the so-called PACMDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we can achieve lower sample complexity bounds than existing algorithms, while using an exploration strategy that is much greedier than the (extremely cautious) exploration of PAC-MDP algorithms.

J. Zico Kolter, Andrew Y. Ng

Real-time Traffic

Bayesian Approach | Bayesian RL | ICML 2009 | Machine Learning | Optimal Bayesian Policy |

claim paper

Related Content

» NearOptimal Reinforcement Learning in Polynomial Time

» Nondeterministic polynomial time factoring in the tile assembly model

» RMAX A General Polynomial Time Algorithm for NearOptimal Reinforcement Learning

» The Power of Team Exploration Two Robots Can Learn Unlabeled Directed Graphs

» The Complexity of PolynomialTime Approximation

» Improved Algorithms for PolynomialTime Decay and TimeDecay with Additive Error

» Censored exploration and the dark pool problem

» A fully polynomial time approximation scheme for timing driven minimum cost buffer inserti...

» On universally easy classes for NPcomplete problems

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2009
Where	ICML
Authors	J. Zico Kolter, Andrew Y. Ng

Comments (0)