Sciweavers

263 search results - page 2 / 53
» Regret Bounds for Prediction Problems
Sort
View
JMLR
2012
11 years 10 months ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...
COCOON
2006
Springer
13 years 11 months ago
Approximating Min-Max (Regret) Versions of Some Polynomial Problems
Abstract. While the complexity of min-max and min-max regret versions of most classical combinatorial optimization problems has been thoroughly investigated, there are very few stu...
Hassene Aissi, Cristina Bazgan, Daniel Vanderpoote...
ALT
2007
Springer
14 years 4 months ago
Tuning Bandit Algorithms in Stochastic Environments
Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this p...
Jean-Yves Audibert, Rémi Munos, Csaba Szepe...
JMLR
2010
125views more  JMLR 2010»
13 years 2 months ago
Regret Bounds for Gaussian Process Bandit Problems
Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. ...
Steffen Grünewälder, Jean-Yves Audibert,...
ECCC
2010
80views more  ECCC 2010»
13 years 7 months ago
Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm
Suppose a decision maker has to purchase a commodity over time with varying prices and demands. In particular, the price per unit might depend on the amount purchased and this pri...
Melanie Winkler, Berthold Vöcking, Sascha Geu...