A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs

15 years 9 months ago

Download www.aaai.org

An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.

Roy Fox, Moshe Tennenholtz

Real-time Traffic

AAAI 2007 | Intelligent Agents | Only-Costly-Observable MDP | Particular Costly Action | Unobservable MDP |

claim paper

» Reinforcement learning for DECMDPs with changing action sets and partially ordered depende...

Post Info
More Details (n/a)

Added	02 Oct 2010
Updated	02 Oct 2010
Type	Conference
Year	2007
Where	AAAI
Authors	Roy Fox, Moshe Tennenholtz

Comments (0)

Sciweavers

A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs

AAAI 2007 | Intelligent Agents | Only-Costly-Observable MDP | Particular Costly Action | Unobservable MDP |

Explore & Download

Productivity Tools

Sciweavers