bandit | Sciweavers

221

JMLR
2012

200views Programming Languages» more JMLR 2012»

Contextual Bandit Learning with Predictable Rewards

13 years 9 months ago

Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...

Alekh Agarwal, Miroslav Dudík, Satyen Kale,...

claim paper

Read More »

255

click to vote

AMAI
2011
Springer

273views Artificial Intelligence» more AMAI 2011»

Multi-armed bandits with episode context

14 years 6 months ago

Download gauss.ececs.uc.edu

A multi-armed bandit episode consists of n trials, each allowing selection of one of K arms, resulting in payoff from a distribution over [0, 1] associated with that arm. We assum...

Christopher D. Rosin

claim paper

Read More »

200

click to vote

AGI
2011

231views Artificial Intelligence» more AGI 2011»

Reinforcement Learning and the Bayesian Control Rule

14 years 10 months ago

Download metatip.com

We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intra...

Pedro Alejandro Ortega, Daniel Alexander Braun, Si...

claim paper

Read More »

175

click to vote

JMLR
2010

103views more JMLR 2010»

Regret Bounds and Minimax Policies under Partial Monitoring

15 years 1 months ago

Download jmlr.csail.mit.edu

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: p...

Jean-Yves Audibert, Sébastien Bubeck

claim paper

Read More »

186

click to vote

CORR
2008
Springer

136views Education» more CORR 2008»

Multi-Armed Bandits in Metric Spaces

15 years 6 months ago

Download www.cs.cornell.edu

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While ...

Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal

claim paper

Read More »

177

click to vote

SDM
2007
SIAM

167views Data Mining» more SDM 2007»

Bandits for Taxonomies: A Model-based Approach

15 years 8 months ago

Download www.cs.cmu.edu

We consider a novel problem of learning an optimal matching, in an online fashion, between two feature spaces that are organized as taxonomies. We formulate this as a multi-armed ...

Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabar...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers