The most popular model used in automatic speech recognition is the hidden Markov model (HMM). Though good performance has been obtained with such models there are well known limitations in its ability to model speech. A variety of modifications to the standard HMM topology have been proposed to handle these problems. One approach is the factorial HMM. This paper introduces a new form of factorial HMM which makes use of transformation streams. The new scheme is a generalisation of the standard factorial HMM and other related schemes in speech processing. A particular form of this model, the HMM error model (HEM) is described in detail. The HEM is evaluated on two standard large vocabulary speaker independent speech recognition tasks. On both tasks significant reductions in word error rate are obtained over standard HMM-based systems.
M. J. F. Gales