Adaptive Learning of Transmission Control Policies for MIMO Fading Channels under Delay Constraint

15 years 27 days ago

Download www.ece.ubc.ca

— This paper addresses learning based adaptive resource allocation for wireless MIMO channels with Markovian fading. The problem is posed as Constrained Markov Decision Process with the goal of minimizing the average transmission cost (such as the transmission power) with the constraint on the average holding cost (such as the transmitter delay). Standard Q-learning algorithm is employed to adaptively ﬁnd the optimal policy for unknown channel/trafﬁc statistics, its convergence properties discussed and shown that it can relatively quickly compute the optimal policy even for rather large state spaces. In order to further improve the convergence rate of the standard Qlearning, we establish several structural results on the optimal policies. We show that the optimal transmission policy is monotonic in the buffer occupancy. This permits us to utilize the supermodularity of the Q-factors and form a structured Q-learning algorithm that increases the convergence rate with respect to the...

Dejan V. Djonin, Vikram Krishnamurthy

Real-time Traffic