Search Sciweavers | Sciweavers

74 search results - page 5 / 15

» Regret Bounds for Gaussian Process Bandit Problems

151

click to vote

ICML
2009
IEEE

109views Machine Learning» more ICML 2009»

Piecewise-stationary bandit problems with side observations

16 years 6 months ago

Download www.cim.mcgill.ca

We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may c...

Jia Yuan Yu, Shie Mannor

claim paper

Read More »

173

Voted

CORR
2010
Springer

187views Education» more CORR 2010»

Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit

15 years 5 months ago

Download www.ece.ucdavis.edu

We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. In this problem, at each time, a player chooses K out of N (N > K) arms to play. The state of ...

Haoyang Liu, Keqin Liu, Qing Zhao

claim paper

Read More »

168

click to vote

COLT
2010
Springer

129views Machine Learning» more COLT 2010»

Nonparametric Bandits with Covariates

15 years 2 months ago

Download www.princeton.edu

We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random cov...

Philippe Rigollet, Assaf Zeevi

claim paper

Read More »

162

click to vote

COLT
2005
Springer

128views Machine Learning» more COLT 2005»

From External to Internal Regret

15 years 7 months ago

Download www.cs.cmu.edu

External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares ...

Avrim Blum, Yishay Mansour

claim paper

Read More »

195

click to vote

JMLR
2012

200views Programming Languages» more JMLR 2012»

Contextual Bandit Learning with Predictable Rewards

13 years 7 months ago

Download www.cs.princeton.edu

Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...

Alekh Agarwal, Miroslav Dudík, Satyen Kale,...

claim paper

Read More »

« Prev « First page 5 / 15 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers