Sciweavers

ACL
2012

Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

12 years 1 months ago
Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.
Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where ACL
Authors Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur
Comments (0)