We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play the arms a total of O (n/ε2 ) log(1/δ) times to find an ε-optimal arm with probability of at least 1−δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.
Shie Mannor, John N. Tsitsiklis