Sciweavers

74 search results - page 5 / 15
» Regret Bounds for Gaussian Process Bandit Problems
Sort
View
ICML
2009
IEEE
14 years 8 months ago
Piecewise-stationary bandit problems with side observations
We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may c...
Jia Yuan Yu, Shie Mannor
CORR
2010
Springer
187views Education» more  CORR 2010»
13 years 7 months ago
Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. In this problem, at each time, a player chooses K out of N (N > K) arms to play. The state of ...
Haoyang Liu, Keqin Liu, Qing Zhao
COLT
2010
Springer
13 years 5 months ago
Nonparametric Bandits with Covariates
We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random cov...
Philippe Rigollet, Assaf Zeevi
COLT
2005
Springer
13 years 9 months ago
From External to Internal Regret
External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares ...
Avrim Blum, Yishay Mansour
JMLR
2012
11 years 10 months ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...