In off-line handwriting recognition, classifiers based on hidden Markov models (HMMs) have become very popular. However, while there exist well-established training algorithms, such as the Baum-Welsh procedure, which optimize the transition and output probabilities of a given HMM architecture, the architecture itself, and in particular the number of states, must be chosen “by hand”. Also the number of training iterations and the output distributions need to be defined by the system designer. In this paper we examine some optimization strategies for an HMM classifier that works with continuous feature values and uses the BaumWelch training algorithm. The free parameters of the optimization procedure introduced in this paper are the number of states of a model, the number of training iterations, and the number of Gaussian mixtures for each state. The proposed optimization strategies are evaluated in the context of a handwritten word recognition task.