Sciweavers

An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit proce