Production scheduling, the problem of sequentially con guring a factory to meet forecasted demands, is a critical problem throughout the manufacturing industry. The requirement of maintaining product inventories in the face of unpredictable demand and stochastic factory output makes standard scheduling models, such as job-shop, inadequate. Currently applied algorithms, such as simulated annealing and constraint propagation, must employ ad-hoc methods such as frequent replanning to cope with uncertainty. In this paper, we describe a Markov Decision Process MDP formulation of production scheduling which captures stochasticity in both production and demands. The solution to this MDP is a value function which can be used to generate optimal scheduling decisions online. A simpleexampleillustrates the theoretical superiority of this approach over replanning-based methods. We then describe an industrial application and two reinforcement learningmethodsfor generating an approximate value func...
Jeff G. Schneider, Justin A. Boyan, Andrew W. Moor