Statistical parsing of noun phrase (NP) structure has been hampered by a lack of goldstandard data. This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank. We correct these errors in CCGbank using a gold-standard corpus of NP structure, resulting in a much more accurate corpus. We also implement novel NER features that generalise the lexical information needed to parse NPs and provide important semantic information. Finally, evaluating against DepBank demonstrates the effectiveness of our modified corpus and novel features, with an increase in
David Vadas, James R. Curran