Search Sciweavers | Sciweavers

75 search results - page 10 / 15

» A Predictive Model for Imitation Learning in Partially Obser...

130

Voted

CORR
2011
Springer

161views Education» more CORR 2011»

Doubly Robust Policy Evaluation and Learning

14 years 7 months ago

Download www.icml-2011.org

We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as...

Miroslav Dudík, John Langford, Lihong Li

claim paper

Read More »

130

click to vote

AAAI
2007

122views Intelligent Agents» more AAAI 2007»

Predictive Exploration for Autonomous Science

15 years 6 months ago

Download www.aaai.org

Often remote investigations use autonomous agents to observe an environment on behalf of absent scientists. Predictive exploration improves these systems’ efﬁciency with onboa...

David R. Thompson

claim paper

Read More »

119

Voted

NECO
2010

103views more NECO 2010»

Posterior Weighted Reinforcement Learning with State Uncertainty

15 years 2 months ago

Download www.maths.bris.ac.uk

Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn f...

Tobias Larsen, David S. Leslie, Edmund J. Collins,...

claim paper

Read More »

154

Voted

ALT
2005
Springer

137views Machine Learning» more ALT 2005»

Defensive Universal Learning with Experts

16 years 19 days ago

Download www.idsia.ch

This paper shows how universal learning can be achieved with expert advice. To this aim, we specify an experts algorithm with the following characteristics: (a) it uses only feedba...

Jan Poland, Marcus Hutter

claim paper

Read More »

109

Voted

UAI
2001

129views Artificial Intelligence» more UAI 2001»

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

15 years 5 months ago

Download cs.anu.edu.au

There exist a number of reinforcement learning algorithms which learn by climbing the gradient of expected reward. Their long-run convergence has been proved, even in partially ob...

Lex Weaver, Nigel Tao

claim paper

Read More »

« Prev « First page 10 / 15 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers