Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

156

EWRL
2008

129views Machine Learning» more EWRL 2008»

Markov Decision Processes with Arbitrary Reward Processes

15 years 8 months ago

Markov Decision Processes with Arbitrary Reward Processes

Download www.cim.mcgill.ca

Abstract. We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily over time. We extend the notion of Hannan consistency to this setting, showing that, in hindsight, the agent can perform almost as well as every deterministic policy. We present efficient online algorithms in the spirit of reinforcement learning that ensure that the agent's performance loss, or regret, vanishes over time, provided that the environment is oblivious to the agent's actions. However, counterexamples indicate that the regret does not vanish if the environment is not oblivious.

Jia Yuan Yu, Shie Mannor, Nahum Shimkin

Real-time Traffic

Agent's Performance Loss | Decision Maker | EWRL 2008 | Machine Learning | Standard Markov Decision |

claim paper

Related Content

» Arbitrarily modulated Markov decision processes

» Asymptotic Learnability of Reinforcement Problems with Arbitrary Dependence

» Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes

» Probabilistic inference for solving discrete and continuous state Markov Decision Processe...

» Perceptive Evaluation for the Optimal Discounted Reward in Markov Decision Processes

» Bounded Parameter Markov Decision Processes with Average Reward Criterion

» Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

» AverageReward Decentralized Markov Decision Processes

» Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

Post Info
More Details (n/a)

Added	19 Oct 2010
Updated	19 Oct 2010
Type	Conference
Year	2008
Where	EWRL
Authors	Jia Yuan Yu, Shie Mannor, Nahum Shimkin

Comments (0)