This paper describes a technique for learning both the number of states and the topologyof Hidden Markov Models from examples. The inductionprocess starts with the most specific model consistent with the training data and generalizes by successively merging states. Both the choice of states to merge and the stopping criterion are guided by the Bayesian posterior probability. We compare our algorithm with the Baum-Welch method of estimating fixed-size models, and find that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge.
Andreas Stolcke, Stephen M. Omohundro