In cellular telephone systems, an important problem is to dynamically allocate the communication resource channels so as to maximize service in a stochastic caller environment. This problem is naturally formulated as a dynamic programming problem and we use a reinforcement learning RL method to nd dynamic channel allocation policies that are better than previous heuristic solutions. The policies obtained perform well for a broad variety of call trafc patterns. We present results on a large cellular system with approximately 4949 states. In cellular communication systems, an important problem is to allocate the communication resource bandwidth so as to maximize the service provided to a set of mobile callers whose demand for service changes stochastically. A given geographical area is divided into mutually disjoint cells, and each cell serves the calls that are within its boundaries see Figure 1a. The total system bandwidth is divided into channels, with each channel centered around a ...
Satinder P. Singh, Dimitri P. Bertsekas