Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

14 years 4 months ago

Download www.cis.upenn.edu

In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a nite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based indirect approaches, which use experience to estimate next-state distributions for o -line value iteration? We rst show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number of state transitions observed. In particular, on the order of only N log1= = 2 logN + log log1= transitions are su cient for both algorithms to come within of the optimal policy, in an idealized model that assumes the observed transitions are well-mixed" throughout an N-state MDP. Thus, the two approaches have roughly the same sample complexity. Perhaps surprisingly, this sample complexity is far less than what is required for the model-based ...

Michael J. Kearns, Satinder P. Singh

Real-time Traffic

Model-based Approach | NIPS 1998 | NIPS 2007 | Observed Transitions | Optimal Policy |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	1998
Where	NIPS
Authors	Michael J. Kearns, Satinder P. Singh

Comments (0)

Sciweavers

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

Model-based Approach | NIPS 1998 | NIPS 2007 | Observed Transitions | Optimal Policy |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers