—Reinforcement learning is a framework in which an agent can learn behavior without knowledge on a task or an environment by exploration and exploitation. Striking a balance between exploration and exploitation is one of the key problems of action selection in reinforcement learning. Exploitation causes the agent to reach a locally optimal policy quickly, whereas excessive exploration degrades the performance of the algorithm, though it may improve the learning performance and escape from a locally optimal policy. Recently the human immune systems have aroused researcher’s interest due to its useful mechanisms which can be exploited for information processing in a complex cognition system. In this paper, we transplant some immune mechanisms into the basic Q-learning algorithm and convert Q-learning algorithm into a search for the optimum solution in combinatorial optimization. Experiments show that the improved Q-learning converges more quickly than Q-learning or Boltzmann explorat...
Zhengqiao Ji, Q. M. Jonathan Wu, Maher A. Sid-Ahme