PAC model-free reinforcement learning

15 years 1 months ago

Download cseweb.ucsd.edu

For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm--Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for ~O(SA) timesteps using O(SA) space, improving on the ~O(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience--no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learning's per-experience computation cost is much less than that of previous PAC algorithms.

Alexander L. Strehl, Lihong Li, Eric Wiewiora, Joh

Real-time Traffic

Efficient Reinforcement Learning | ICML 2006 | Machine Learning | Markov Decision Process | Single Continuous Thread |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2006
Where	ICML
Authors	Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, Michael L. Littman

Comments (0)

Sciweavers

PAC model-free reinforcement learning

Efficient Reinforcement Learning | ICML 2006 | Machine Learning | Markov Decision Process | Single Continuous Thread |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers