While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds ...
—We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a s...
Adam J. Mersereau, Paat Rusmevichientong, John N. ...
We consider the classical multi-armed bandit problem with Markovian rewards. When played an arm changes its state in a Markovian fashion while it remains frozen when not played. Th...
Abstract. We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The stra...
We consider a combinatorial generalization of the classical multi-armed bandit problem that is defined as follows. There is a given bipartite graph of M users and N M resources. F...