We examine some of the frequently disregarded subtleties of tokenization in Penn Treebank style, and present a new rule-based preprocessing toolkit that not only reproduces the Tr...
This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coor...
Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Penn Treebank are...
Shane Bergsma, David Yarowsky, Kenneth Ward Church
This paper presents a simple and effective approach to improve dependency parsing by using subtrees from auto-parsed data. First, we use a baseline parser to parse large-scale una...
Syntactic consistency is the preference to reuse a syntactic construction shortly after its appearance in a discourse. We present an analysis of the WSJ portion of the Penn Treeba...
We present algorithms for higher-order dependency parsing that are "third-order" in the sense that they can evaluate substructures containing three dependencies, and &qu...
In this work we learn clusters of contextual annotations for non-terminals in the Penn Treebank. Perhaps the best way to think about this problem is to contrast our work with that...
William P. Headden III, Eugene Charniak, Mark John...
This paper describes a parser which generates parse trees with empty elements in which traces and fillers are co-indexed. The parser is an unlexicalized PCFG parser which is guara...
This paper describes an algorithm for detecting empty nodes in the Penn Treebank (Marcus et al., 1993), finding their antecedents, and assigning them function tags, without access...
We present a method for automatic determiner selection, based on an existing language model. We train on the Penn Treebank and also use additional data from the North American New...