Sciweavers

2 search results - page 1 / 1
» Contextual Bandit Learning with Predictable Rewards
Sort
View
JMLR
2012
11 years 10 months ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...
CORR
2011
Springer
161views Education» more  CORR 2011»
12 years 11 months ago
Doubly Robust Policy Evaluation and Learning
We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as...
Miroslav Dudík, John Langford, Lihong Li