Sciweavers

75 search results - page 10 / 15
» A Predictive Model for Imitation Learning in Partially Obser...
Sort
View
CORR
2011
Springer
161views Education» more  CORR 2011»
12 years 11 months ago
Doubly Robust Policy Evaluation and Learning
We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as...
Miroslav Dudík, John Langford, Lihong Li
AAAI
2007
13 years 10 months ago
Predictive Exploration for Autonomous Science
Often remote investigations use autonomous agents to observe an environment on behalf of absent scientists. Predictive exploration improves these systems’ efficiency with onboa...
David R. Thompson
NECO
2010
103views more  NECO 2010»
13 years 6 months ago
Posterior Weighted Reinforcement Learning with State Uncertainty
Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn f...
Tobias Larsen, David S. Leslie, Edmund J. Collins,...
ALT
2005
Springer
14 years 4 months ago
Defensive Universal Learning with Experts
This paper shows how universal learning can be achieved with expert advice. To this aim, we specify an experts algorithm with the following characteristics: (a) it uses only feedba...
Jan Poland, Marcus Hutter
UAI
2001
13 years 9 months ago
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning
There exist a number of reinforcement learning algorithms which learn by climbing the gradient of expected reward. Their long-run convergence has been proved, even in partially ob...
Lex Weaver, Nigel Tao