Open Loop Optimistic Planning

13 years 11 months ago

Download www.colt2010.org

We consider the problem of planning in a stochastic and discounted environment with a limited numerical budget. More precisely, we investigate strategies exploring the set of possible sequences of actions, so that, once all available numerical resources (e.g. CPU time, number of calls to a generative model) have been used, one returns a recommendation on the best possible immediate action to follow based on this exploration. The performance of a strategy is assessed in terms of its simple regret, that is the loss in performance resulting from choosing the recommended action instead of an optimal one. We first provide a minimax lower bound for this problem, and show that a uniform planning strategy matches this minimax rate (up to a logarithmic factor). Then we propose a UCB (Upper Confidence Bounds)-based planning algorithm, called OLOP (Open-Loop Optimistic Planning), which is also minimax optimal, and prove that it enjoys much faster rates when there is a small proportion of near-op...

Sébastien Bubeck, Rémi Munos

Real-time Traffic

COLT 2010 | Limited Numerical Budget | Machine Learning | Planning | Possible Immediate Action |

claim paper

Post Info
More Details (n/a)

Added	10 Feb 2011
Updated	10 Feb 2011
Type	Journal
Year	2010
Where	COLT
Authors	Sébastien Bubeck, Rémi Munos

Comments (0)

Sciweavers

Open Loop Optimistic Planning

COLT 2010 | Limited Numerical Budget | Machine Learning | Planning | Possible Immediate Action |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers