Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

15 years 6 months ago

Download www.hutter1.net

The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle t action yt results in perception xt and reward rt, where all quantities in general may depend on the complete history. The perception xt and reward rt are sampled from the (reactive) environmental probability distribution

Marcus Hutter

Real-time Traffic

CORR 2002 | Education | Perception Xt | Reward Rt | Sequential Decision |

claim paper

Post Info
More Details (n/a)

Added	18 Dec 2010
Updated	18 Dec 2010
Type	Journal
Year	2002
Where	CORR
Authors	Marcus Hutter

Comments (0)

Sciweavers

Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

CORR 2002 | Education | Perception Xt | Reward Rt | Sequential Decision |

Explore & Download

Productivity Tools

Sciweavers