The choice of a good annealing schedule is necessary for good performance of simulated annealing for combinatorial optimization problems. In this paper, we pose the simulated annealing task decision-theoretically for the first time, allowing the user to explicitly define utilities of time and solution quality. We then demonstrate the application of reinforcement learning techniques towards approximately optimal annealing control, using traveling salesman, clustered traveling salesman, and scheduling problems. Although many means of automating control of annealing temperatures have been proposed, our techniques requires no domain-specific knowledge of problems and provides a natural means of expressing time versus quality tradeoffs. Finally, we discuss alternate ions for future decision-theoretic variants.
Todd W. Neller, Christopher J. La Pilla