Sciweavers

COLT
2000
Springer
14 years 3 months ago
Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning
We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (  ¢¡¤£¦¥§  ), and focus on gradient ascent approache...
Peter L. Bartlett, Jonathan Baxter