Sciweavers

81 search results - page 12 / 17
» The Optimal Reward Baseline for Gradient-Based Reinforcement...
Sort
View
AAAI
2007
13 years 10 months ago
Active Imitation Learning
Imitation learning, also called learning by watching or programming by demonstration, has emerged as a means of accelerating many reinforcement learning tasks. Previous work has s...
Aaron P. Shon, Deepak Verma, Rajesh P. N. Rao
ECCV
2010
Springer
13 years 11 months ago
Discriminative Tracking by Metric Learning
We present a discriminative model that casts appearance modeling and visual matching into a single objective for visual tracking. Most previous discriminative models for visual tra...
ATAL
2004
Springer
14 years 1 months ago
A Pheromone-Based Utility Model for Collaborative Foraging
Multi-agent research often borrows from biology, where remarkable examples of collective intelligence may be found. One interesting example is ant colonies’ use of pheromones as...
Liviu Panait, Sean Luke
ACL
2008
13 years 9 months ago
Learning Effective Multimodal Dialogue Strategies from Wizard-of-Oz Data: Bootstrapping and Evaluation
We address two problems in the field of automatic optimization of dialogue strategies: learning effective dialogue strategies when no initial data or system exists, and evaluating...
Verena Rieser, Oliver Lemon
COGSR
2011
71views more  COGSR 2011»
13 years 2 months ago
Psychological models of human and optimal performance in bandit problems
In bandit problems, a decision-maker must choose between a set of alternatives, each of which has a fixed but unknown rate of reward, to maximize their total number of rewards ov...
Michael D. Lee, Shunan Zhang, Miles Munro, Mark St...