Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms

10 years 2 months ago

Download www.smp.uq.edu.au

We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm is only observed when it is played and the state-dependent reward is collected. Since arms are only partially observable, a good decision policy needs to account for the fact that information about the state of an arm becomes more and more obsolete while the arm is not being played. Thus, the decision maker faces a tradeoﬀ between exploiting those arms that are believed to be currently the most rewarding (i.e. those with the largest conditional mean), and exploring arms with a high conditional variance. Moreover, one would like the decision policy to remain tractable despite the inﬁnite state space and also in systems with many arms. A policy that gives some priority to exploration is the Whittle index policy, for which we establish structural properties. These motivate a parametric index policy that is computationally much simpler than the Whittle index but can still outperform the ...

Julia Kuhn, Michel Mandjes, Yoni Nazarathy

Real-time Traffic

Formal Methods | SAS 2015 |

claim paper

Post Info
More Details (n/a)

Added	17 Apr 2016
Updated	17 Apr 2016
Type	Journal
Year	2015
Where	SAS
Authors	Julia Kuhn, Michel Mandjes, Yoni Nazarathy

Comments (0)

Sciweavers

Exploration vs Exploitation with Partially Observable Gaussian Autoregressive Arms

Formal Methods | SAS 2015 |

Explore & Download

Productivity Tools

Sciweavers