Search Sciweavers | Sciweavers

44 search results - page 5 / 9

» A structured multiarmed bandit problem and the greedy policy

click to vote

TWC
2008

130views more TWC 2008»

On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance

13 years 7 months ago

Download anrg.usc.edu

We consider a multi-channel opportunistic communication system where the states of these channels evolve as independent and statistically identical Markov chains (the Gilbert-Elli...

Qing Zhao, Bhaskar Krishnamachari, Keqin Liu

claim paper

Read More »

click to vote

Publication

334views

Rollout Sampling Approximate Policy Iteration

14 years 4 months ago

Download www.springerlink.com

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schem...

Christos Dimitrakakis, Michail G. Lagoudakis

posted by olethros

Read More »

click to vote

CORR
2010
Springer

189views Education» more CORR 2010»

An Optimal Dynamic Mechanism for Multi-Armed Bandit Processes

13 years 7 months ago

Download research.microsoft.com

We consider the problem of revenue-optimal dynamic mechanism design in settings where agents' types evolve over time as a function of their (both public and private) experien...

Sham M. Kakade, Ilan Lobel, Hamid Nazerzadeh

claim paper

Read More »

click to vote

FOCS
2007
IEEE

157views Theoretical Computer Science» more FOCS 2007»

Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards

14 years 1 months ago

Download www.cis.upenn.edu

We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...

Sudipto Guha, Kamesh Munagala

claim paper

Read More »

click to vote

COLT
2010
Springer

207views Machine Learning» more COLT 2010»

An Asymptotically Optimal Bandit Algorithm for Bounded Support Models

13 years 5 months ago

Download www.colt2010.org

Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...

Junya Honda, Akimichi Takemura

claim paper

Read More »

« Prev « First page 5 / 9 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers