Regret Bounds and Minimax Policies under Partial Monitoring

13 years 7 months ago

Download jmlr.csail.mit.edu

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for (x) = exp(x)+ K , INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with (x) = -x q + K , which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also p...

Jean-Yves Audibert, Sébastien Bubeck

Real-time Traffic

Bandit | Bandit Game | Forecaster | JMLR 2010 |

claim paper

Post Info
More Details (n/a)

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	JMLR
Authors	Jean-Yves Audibert, Sébastien Bubeck

Comments (0)

Sciweavers

Regret Bounds and Minimax Policies under Partial Monitoring

Bandit | Bandit Game | Forecaster | JMLR 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers