Semi-continuous acoustic models, where the output distributions for all Hidden Markov Model states share a common codebook of Gaussian density functions, are a well-known and proven technique for reducing computation in automatic speech recognition. However, the size of the parameter files, and thus their memory footprint at runtime, can be very large. We demonstrate how non-linear quantization can be combined with a mixture weight distribution pruning technique to halve the size of the models with minimal performance overhead and no increase in error rate.
David Huggins-Daines, Alexander I. Rudnicky