This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequ...
This paper presents a new method to automatically add n-grams containing out-of-vocabulary (OOV) words to a baseline language model (LM), where these n-grams are sought to be gram...
\Prose rhythm" is a widely observed but scarcely quanti ed phenomenon. We describe an information-theoretic model for measuring the regularity of lexical stress in English te...
A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dim...
Research on the discovery of terms from corpora has focused on word sequences whose recurrent occurrence in a corpus is indicative of their terminological status, and has not addr...
The Wikipedia XML collection turned out to be rich of marked-up phrases as we carried out our INEX 2007 experiments. Assuming that a phrase occurs at the inline level of the markup...