Sciweavers

179 search results - page 33 / 36
» Learning Relational Navigation Policies
Sort
View
WWW
2008
ACM
14 years 8 months ago
As we may perceive: finding the boundaries of compound documents on the web
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Pavel Dmitriev
COLT
2008
Springer
13 years 9 months ago
Adapting to a Changing Environment: the Brownian Restless Bandits
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
Aleksandrs Slivkins, Eli Upfal
NIPS
2007
13 years 9 months ago
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...
Ambuj Tewari, Peter L. Bartlett
ENVSOFT
2007
126views more  ENVSOFT 2007»
13 years 7 months ago
Uncertainty and precaution in environmental management: Insights from the UPEM conference
Communication across the science-policy interface is complicated by uncertainty and ignorance associated with predictions on which to base policies. The international symposium â€...
Jeroen P. van der Sluijs
NIPS
2008
13 years 9 months ago
Goal-directed decision making in prefrontal cortex: a computational framework
Research in animal learning and behavioral neuroscience has distinguished between two forms of action control: a habit-based form, which relies on stored action values, and a goal...
Matthew Botvinick, James An