Sciweavers

ML
2002
ACM

Near-Optimal Reinforcement Learning in Polynomial Time

13 years 11 months ago
Near-Optimal Reinforcement Learning in Polynomial Time
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy in the undiscounted case or by the horizon time T in the discounted case, we then give algorithms requiring a number of actions and total computation time that are only polynomial in T and the number of states, for both the undiscounted and discounted cases. An interesting aspect of our algorithms is their explicit handling of the ExplorationExploitation trade-o .
Michael J. Kearns, Satinder P. Singh
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 2002
Where ML
Authors Michael J. Kearns, Satinder P. Singh
Comments (0)