Linearly Parameterized Bandits

14 years 27 days ago

Download legacy.orie.cornell.edu

We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vector Z Rr, where r 2. The objective is to minimize the cumulative regret and Bayes risk. When the set of arms corresponds to the unit sphere, we prove that the regret and Bayes risk is of order (r T), by establishing a lower bound for an arbitrary policy, and showing that a matching upper bound is obtained through a policy that alternates between exploration and exploitation phases. The phasebased policy is also shown to be effective if the set of arms satisfies a strong convexity condition. For the case of a general set of arms, we describe a near-optimal policy whose regret and Bayes risk admit upper bounds of the form O(r T log3/2 T). Original Submission: January 19, 2009

Paat Rusmevichientong, John N. Tsitsiklis

Real-time Traffic

CORR 2008 | Cumulative Regret | Education | Policy | R-dimensional Random Vector |

claim paper

Post Info
More Details (n/a)

Added	10 Dec 2010
Updated	10 Dec 2010
Type	Journal
Year	2008
Where	CORR
Authors	Paat Rusmevichientong, John N. Tsitsiklis

Comments (0)

Sciweavers

Linearly Parameterized Bandits

CORR 2008 | Cumulative Regret | Education | Policy | R-dimensional Random Vector |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers