Sciweavers

44 search results - page 5 / 9
» A structured multiarmed bandit problem and the greedy policy
Sort
View
TWC
2008
130views more  TWC 2008»
13 years 7 months ago
On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance
We consider a multi-channel opportunistic communication system where the states of these channels evolve as independent and statistically identical Markov chains (the Gilbert-Elli...
Qing Zhao, Bhaskar Krishnamachari, Keqin Liu

Publication
334views
14 years 4 months ago
Rollout Sampling Approximate Policy Iteration
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schem...
Christos Dimitrakakis, Michail G. Lagoudakis
CORR
2010
Springer
189views Education» more  CORR 2010»
13 years 7 months ago
An Optimal Dynamic Mechanism for Multi-Armed Bandit Processes
We consider the problem of revenue-optimal dynamic mechanism design in settings where agents' types evolve over time as a function of their (both public and private) experien...
Sham M. Kakade, Ilan Lobel, Hamid Nazerzadeh
FOCS
2007
IEEE
14 years 1 months ago
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
Sudipto Guha, Kamesh Munagala
COLT
2010
Springer
13 years 5 months ago
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
Junya Honda, Akimichi Takemura