Exploration scavenging

16 years 7 months ago

Download hunch.net

We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evaluation can be impossible if the exploration policy chooses actions based on the side information provided at each time step. We then propose and prove the correctness of a principled method for policy evaluation which works when this is not the case, even when the exploration policy is deterministic, as long as each action is explored sufficiently often. We apply this general technique to the problem of offline evaluation of internet advertising policies. Although our theoretical results hold only when the exploration policy chooses ads independent of side information, an assumption that is typically violated by commercial systems, we show how clever uses of the theory provide non-trivial and realistic applications. We also provide an empirical demonstration of the effectiveness of our techniques on real ad pla...

John Langford, Alexander L. Strehl, Jennifer Wortm

Real-time Traffic

Exploration Policy | ICML 2008 | Machine Learning | Policy Chooses Actions | Policy Chooses Ads |

claim paper

» Exploring data reliability tradeoffs in replicated storage systems

» Technologies for an Autonomous Wireless Home Healthcare System

» Exploring the barrier to entry incremental generational garbage collection for Haskell

» An Ultra Low Power System Architecture for Sensor Network Applications

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2008
Where	ICML
Authors	John Langford, Alexander L. Strehl, Jennifer Wortman

Comments (0)

Sciweavers

Exploration scavenging

Exploration Policy | ICML 2008 | Machine Learning | Policy Chooses Actions | Policy Chooses Ads |

Explore & Download

Productivity Tools

Sciweavers