We describe an approach to training a statistical parser from a bracketed corpus, and demonstrate its use in a software testing application that translates English speci cations into an automated testing language. A grammar is not explicitly speci ed; the rules and contextual probabilities of occurrence are automatically generated from the corpus. The parser is extremely successful at producing and identifying the correct parse, and nearly deterministic in the number of parses that it produces. To compensate for undertraining, the parser also uses general, linguistic subtheories which aid in guessing some types of novel structures.
Mark A. Jones, Jason Eisner