A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

14 years 6 months ago

Download www.cis.upenn.edu

An issue that is critical for the application of Markov decision processes MDPs to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or even in nite state spaces, traditional planning and reinforcementlearningalgorithmsare ofteninapplicable, since their running time typically scales linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model simulator for an arbitrary MDP, performs near-optimal planning with a running time that has no dependence on the number of states. Although the running time is exponential in the horizon time which depends only on the discount factor and the desired degree of approximation to the optimal policy, our results establish for the rst time that there are no theoretical barriers to computing near-optimal policies in arbitrarily large, unstructured MDPs.

Michael J. Kearns, Yishay Mansour, Andrew Y. Ng

Real-time Traffic