R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

14 years 5 months ago

Download jmlr.csail.mit.edu

R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent's observations. R-max improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's E3 algorithm, covering zero-sum stochastic games. (2) It has a built-in mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the "optimism under uncertainty" bias used in many RL algorithms. (4) It is simpler, more general, and more efficient than Brafman and Tennenholtz's LSG algorithm for learning in single controller stochastic game...

Ronen I. Brafman, Moshe Tennenholtz

Real-time Traffic

Algorithms | IJCAI 2001 | IJCAI 2007 | Near-optimal Average Reward | Reinforcement Learning Algorithm |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	IJCAI
Authors	Ronen I. Brafman, Moshe Tennenholtz

Comments (0)

Sciweavers

R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

Algorithms | IJCAI 2001 | IJCAI 2007 | Near-optimal Average Reward | Reinforcement Learning Algorithm |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers