In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. Wh...
We consider the classical multi-armed bandit problem with Markovian rewards. When played an arm changes its state in a Markovian fashion while it remains frozen when not played. Th...
We consider an opportunistic spectrum access (OSA) problem where the time-varying condition of each channel (e.g., as a result of random fading or certain primary users' activ...
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...
Abstract. Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In r...