Sciweavers

CCECE
2015
IEEE

An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit proce

8 years 8 months ago
An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit proce
— The multi-armed bandit (MAB) problem has been an active area of research since the early 1930s. The majority of the literature restricts attention to i.i.d. or Markov reward processes. In this paper, the finite-parameter MAB problem with time-dependent reward processes is investigated. An upper confidence bound (UCB) based index policy, where the index is computed based on the maximum-likelihood estimate of the unknown parameter, is proposed. This policy locks on to the optimal arm in finite expected time but has a super-linear regret. As an example, the proposed index policy is used for minimizing prediction error when each arm is a auto-regressive moving average (ARMA) process.
Prokopis C. Prokopiou, Peter E. Caines, Aditya Mah
Added 17 Apr 2016
Updated 17 Apr 2016
Type Journal
Year 2015
Where CCECE
Authors Prokopis C. Prokopiou, Peter E. Caines, Aditya Mahajan
Comments (0)