This paper describes the Q-routing algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each node to keep accurate statistics on which routing decisions lead to minimal delivery times. In simple experiments involving a 36-node, irregularly connected network, Q-routing proves superior to a nonadaptive algorithm based on precomputed shortest paths and is able to route e ciently even when critical aspects of the simulation, such as the network load, are allowed to vary dynamically. The paper concludes with a discussion of the tradeo between discovering shortcuts and maintaining stable policies.
Justin A. Boyan, Michael L. Littman