Experience-efficient learning in associative bandit problems

16 years 3 months ago

Download paul.rutgers.edu

We formalize the associative bandit problem framework introduced by Kaelbling as a learning-theory problem. The learning environment is modeled as a k-armed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained to a known hypothesis class, learning can be performed efficiently with respect to the VC dimension of this class. We formally reduce the problem of PAC classification to the associative bandit problem, producing an efficient algorithm for any hypothesis class for which efficient classification algorithms are known. We demonstrate the approach empirically on a scalable concept class.

Alexander L. Strehl, Chris Mesterharm, Michael L.

Real-time Traffic

Associative Bandit Problem | Bandit Problem Framework | Hypothesis Class | ICML 2006 | Machine Learning |

claim paper

» Active Learning in Multiarmed Bandits

» Adapting to a Changing Environment the Brownian Restless Bandits

» MultiArmed Bandit Mechanisms for MultiSlot Sponsored Search Auctions

» Dynamic Assortment with Demand Learning for Seasonal Consumer Goods

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2006
Where	ICML
Authors	Alexander L. Strehl, Chris Mesterharm, Michael L. Littman, Haym Hirsh

Comments (0)

Sciweavers

Experience-efficient learning in associative bandit problems

Associative Bandit Problem | Bandit Problem Framework | Hypothesis Class | ICML 2006 | Machine Learning |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers