We propose a general-purpose stochastic optimization algorithm, the so-called annealing stochastic approximation Monte Carlo (ASAMC) algorithm, for neural network training. ASAMC can be regarded as a space annealing version of the stochastic approximation Monte Carlo (SAMC) algorithm. Under mild conditions, we show that ASAMC can converge weakly at a rate of Ω(1/ √ t) toward a neighboring set (in the space of energy) of the global minimizers. ASAMC is compared with simulated annealing, SAMC, and the BFGS algorithm for training MLPs on a number of examples. The numerical results indicate that ASAMC outperforms the other algorithms in both training and test errors. Like other stochastic algorithms, ASAMC requires longer training time than do the gradient-based algorithms. It provides, however, an efficient approach to train MLPs for which the energy landscape is rugged. Keywords Back-propagation · Convergence rate · Markov chain Monte Carlo · Multiple layer perceptron · Simulated...