We study the issue of porting a known NLP method to a language with little existing NLP resources, specifically Hebrew SVM-based chunking. We introduce two SVM-based methods – ...
This paper studies the impact of written language variations and the way it affects the capitalization task over time. A discriminative approach, based on maximum entropy models, ...
This paper describes a fully automatic twostage machine learning architecture that learns temporal relations between pairs of events. The first stage learns the temporal attribut...
Finding temporal and causal relations is crucial to understanding the semantic structure of a text. Since existing corpora provide no parallel temporal and causal annotations, we ...
Even in a massive corpus such as the Web, a substantial fraction of extractions appear infrequently. This paper shows how to assess the correctness of sparse extractions by utiliz...
This paper describes an extractive summarizer for educational science content called COGENT. COGENT extends MEAD based on strategies elicited from an empirical study with domain a...
Sebastian de la Chica, Faisal Ahmad, James H. Mart...
We present the design and evaluation of a translator’s amenuensis that uses comparable corpora to propose and rank nonliteral solutions to the translation of expressions from th...
Bogdan Babych, Anthony Hartley, Serge Sharoff, Olg...
We use an EM algorithm to learn user models in a spoken dialog system. Our method requires automatically transcribed (with ASR) dialog corpora, plus a model of transcription error...
When a word sense disambiguation (WSD) system is trained on one domain but applied to a different domain, a drop in accuracy is frequently observed. This highlights the importance...