Abstract—This paper considers maximizing throughput utility in a multi-user network with partially observable Markov ON/OFF channels. Instantaneous channel states are never known, and all control decisions are based on information provided by ACK/NACK feedback from past transmissions. This system can be viewed as a restless multi-armed bandit problem with a concave objective function of the time average reward vector. Such problems are generally intractable. However, we provide an approximate solution by optimizing the concave objective over a non-trivial inner bound on the network performance region, where the inner bound is constructed by randomizing welldesigned stationary policies. Using a new frame-based Lyapunov drift argument, we design a policy of admission control and channel selection that stabilizes the network with throughput utility that can be made arbitrarily close to the optimal in the inner performance region. Our problem has applications in limited channel probing i...
Chih-Ping Li, Michael J. Neely