State-of-the-art large vocabulary continuous speech recognition (LVCSR) systems often combine outputs from multiple subsystems developed at different sites. Cross system adaptation can be used as an alternative to direct hypothesis level combination schemes such as ROVER. In normal cross adaptation it is assumed that useful diversity among systems exists only at acoustic level. However, complimentary features among complex LVCSR systems also manifest themselves in other layers of modelling hierarchy, e.g., subword and word level. It is thus interesting to also cross adapt language models (LM) to capture them. In this paper cross adaptation of multi-level LMs modelling both syllable and word sequences was investigated to improve LVCSR system combination. Significant error rate gains of 6.7% relative were obtained over ROVER and acoustic model only cross adaptation when combining 13 Chinese LVCSR subsystems used in the 2010 DARPA GALE evaluation.
Xunying Liu, Mark J. F. Gales, Philip C. Woodland