In this paper a combined use of reinforcement learning and simulated annealing is treated. Most of the simulated annealing methods suggest using heuristic temperature bounds as the basis of annealing. Here a theoretically established approach tailored to reinforcement learning following Softmax action selection policy will be shown. An application example of agent-based routing will also be illustrated.