Sciweavers

SAS
2015
Springer

Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms

8 years 8 months ago
Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms
We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm is only observed when it is played and the state-dependent reward is collected. Since arms are only partially observable, a good decision policy needs to account for the fact that information about the state of an arm becomes more and more obsolete while the arm is not being played. Thus, the decision maker faces a tradeoff between exploiting those arms that are believed to be currently the most rewarding (i.e. those with the largest conditional mean), and exploring arms with a high conditional variance. Moreover, one would like the decision policy to remain tractable despite the infinite state space and also in systems with many arms. A policy that gives some priority to exploration is the Whittle index policy, for which we establish structural properties. These motivate a parametric index policy that is computationally much simpler than the Whittle index but can still outperform the ...
Julia Kuhn, Michel Mandjes, Yoni Nazarathy
Added 17 Apr 2016
Updated 17 Apr 2016
Type Journal
Year 2015
Where SAS
Authors Julia Kuhn, Michel Mandjes, Yoni Nazarathy
Comments (0)