Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

13 years 9 months ago

Download www.cs.jhu.edu

Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difﬁcult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain signiﬁcant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain signiﬁcant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.

Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur

Real-time Traffic

ACL 2012 | Computational Linguistics | Dependency Parser | Language Models | Speed Improvements |

claim paper

Added	29 Sep 2012
Updated	29 Sep 2012
Type	Journal
Year	2012
Where	ACL
Authors	Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur

Sciweavers

Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

ACL 2012 | Computational Linguistics | Dependency Parser | Language Models | Speed Improvements |

Explore & Download

Productivity Tools

Sciweavers