Search Sciweavers | Sciweavers

29

COLT
2004
Springer

78views Machine Learning» more COLT 2004»

Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary

14 years 3 months ago

We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this proble...

H. Brendan McMahan, Avrim Blum

claim paper

Read More »

31

click to vote

COLT
2008
Springer

179views Machine Learning» more COLT 2008»

Adapting to a Changing Environment: the Brownian Restless Bandits

13 years 11 months ago

Download research.microsoft.com

In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...

Aleksandrs Slivkins, Eli Upfal

claim paper

Read More »

33

click to vote

ICML
2009
IEEE

170views Machine Learning» more ICML 2009»

Interactively optimizing information retrieval systems as a dueling bandits problem

14 years 10 months ago

Download www.yisongyue.com

We present an on-line learning framework tailored towards real-time learning from observed user behavior in search engines and other information retrieval systems. In particular, ...

Yisong Yue, Thorsten Joachims

claim paper

Read More »

31

click to vote

STOC
2007
ACM

146views Algorithms» more STOC 2007»

Playing games with approximation algorithms

14 years 10 months ago

Download www.cc.gatech.edu

In an online linear optimization problem, on each period t, an online algorithm chooses st S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adve...

Sham M. Kakade, Adam Tauman Kalai, Katrina Ligett

claim paper

Read More »

26

click to vote

ALT
2008
Springer

141views Machine Learning» more ALT 2008»

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

14 years 6 months ago

Download personal.unileoben.ac.at

Abstract. We consider an upper conﬁdence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...

Ronald Ortner

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers