This paper presents Domain Relevance Estimation (DRE), a fully unsupervised text categorization technique based on the statistical estimation of the relevance of a text with respe...
We describe how simple, commonly understood statistical models, such as statistical dependency parsers, probabilistic context-free grammars, and word-to-word translation models, c...
This paper provides evidence for Genzel and Charniak's (2002) entropy rate principle, which predicts that the entropy of a sentence increases with its position in the text. W...
The morphology of Semitic languages is unique in the sense that the major word-formation mechanism is an inherently non-concatenative process of interdigitation, whereby two morph...
Accurate dependency recovery has recently been reported for a number of wide-coverage statistical parsers using Combinatory Categorial Grammar (CCG). However, overall figures give...
We present a new approach to intrinsic summary evaluation, based on initial experiments in van Halteren and Teufel (2003), which combines two novel aspects: comparison of informat...
The focus of research in text classification has expanded from simple topic identification to more challenging tasks such as opinion/modality identification. Unfortunately, the la...
Paraphrase recognition is a critical step for natural language interpretation. Accordingly, many NLP applications would benefit from high coverage knowledge bases of paraphrases. ...
We present a novel discriminative approach to parsing inspired by the large-margin criterion underlying support vector machines. Our formulation uses a factorization analogous to ...
Ben Taskar, Dan Klein, Mike Collins, Daphne Koller...